PhD Thesis - ICINo Ordre: 1879 PhD Thesis Sp´ecialit´e : Informatique Sparse preconditioners for...

No Ordre: 1879

PhD Thesis

Specialite : Informatique

Sparse preconditioners for dense linear

systems from electromagneticapplications

presentee le 23 Avril 2002 a

l’Institut National Polytechnique de Toulouse

par

Bruno CARPENTIERI

CERFACS

devant le Jury compose de :

G. Alleon EADSM. Dayde Professeur a l’ENSEEIHTI. S. Duff Project Leader at CERFACS

Group Leader at Rutherford Appleton Laboratory PresidentL. Giraud CERFACSG. Meurant CEA RapporteurY. Saad Professor at the University of Minnesota RapporteurS. Piperno INRIA-CERMICS

CERFACS report: TH/PA/02/48

i

Acknowledgments

I wish to express my sincere gratitude to Iain S. Duff and Luc Giraudbecause they introduced me to the subject of this thesis and guided myresearch with vivid interest. They taught me the enjoyment both for rigourand for simplicity, and let me experience the freedom and the excitement ofpersonal discovery. Without their professional advice and their trust in me,this thesis would not be possible.

My sincere thanks go to Michel Dayde for his continued support in thedevelopment of my research at CERFACS.

I am grateful to Gerard Meurant and Yousef Saad who accepted to actas referees for my thesis. It was an honour for me to benefit from theirfeedback on my research work.

I wish to thank Guillaume Alleon and Serge Piperno who opened methe door of an enriching collaboration with EADS and INRIA-CERMICS,respectively, and accepted to take part in my jury. Guillaume Sylvand atINRIA-CERMICS deserves thanks for providing me with codes and valuablesupport.

Grateful acknowledgments are made for the EMC Team at CERFACSfor their interest in my work, in particular to Mbarek Fares who providedme with the CESC code, and Francis Collino and Florence Millot for manyfertile discussions.

I would like to thank sincerely all the members of the Parallel AlgorithmsTeam and CSG at CERFACS for their professional and friendly support, andBrigitte Yzel for helping me many times gently. The Parallel AlgorithmsTeam represented a stimulating environment to develop my thesis. I amgrateful to many visitors or colleagues who, at different stages, shared myenjoyment for this research.

Above all, I wish to express my deep gratitude to my family and friendsfor their presence and continued support.

This work was supported by INDAM under a grant ”Borsa di Studio perl’Estero A.A. 1998-’99” (Provvedimento del Presidente del 30 Aprile 1998),and by CERFACS.

- B. C.

iii

To my family

v

Don’t just say ”it is impossible” without putting a sincere effort.

Observe the word ”Impossible” carefully .. You can see ”I’m possible”.

What really matters is your attitude and your perception.

Anonymous

vii

Abstract

In this work, we investigate the use of sparse approximate inversepreconditioners for the solution of large dense complex linear systems arisingfrom integral equations in electromagnetism applications.

The goal of this study is the development of robust and parallelizablepreconditioners that can easily be integrated in simulation codes able totreat large configurations. We first adapt to the dense situation thepreconditioners initialy developed for sparse linear systems. We comparetheir respective numerical behaviours and propose a robust pattern selectionstrategy for Frobenius-norm minimization preconditioners.

Our approach has been implemented by another PhD student in alarge parallel code that exploits a fast multipole calculation for the matrixvector product in the Krylov iterations. This enables us to study thenumerical scalability of our preconditioner on large academic and industrialtest problems in order to identify its limitations. To remove these limitationswe propose an embedded scheme. This inner-outer technique enables tosignificantly reduce the computational cost of the simulation and improvethe robustness of the preconditioner. In particular, we were able to solve alinear system with more than a million unknowns arising from a simulationon a real aircraft. That solution was out of reach with our initial technique.

Finally we perform a preliminary study on a spectral two-levelpreconditioner to enhance the robustness of our preconditioner. Thisnumerical technique exploits spectral information of the preconditionedsystems to build a low-rank update of the preconditioner.

Keywords : Krylov subspace methods, preconditioning techniques,sparse approximate inverse, Frobenius-norm minimization method, nonzeropattern selection strategies, electromagnetic scattering applications,boundary element method, fast multipole method.

Contents

1 Introduction 11.1 The physical problem and applications . . . . . . . . . . . . . 21.2 The mathematical problem . . . . . . . . . . . . . . . . . . . 21.3 Numerical solution of Maxwell’s equations . . . . . . . . . . . 5

1.3.1 Differential equation methods . . . . . . . . . . . . . . 51.3.2 Integral equation methods . . . . . . . . . . . . . . . . 6

1.4 Direct versus iterative solution methods . . . . . . . . . . . . 81.4.1 A sparse approach for solving scattering problems . . 9

2 Iterative solution via preconditioned Krylov solvers of densesystems in electromagnetism 132.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 132.2 Preconditioning based on sparsification strategies . . . . . . . 18

2.2.1 SSOR . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.2 Incomplete Cholesky factorization . . . . . . . . . . . 252.2.3 AINV . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.4 SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.5 SLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.2.6 Other preconditioners . . . . . . . . . . . . . . . . . . 50

2.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 50

3 Sparse pattern selection strategies for robust Frobenius-norm minimization preconditioner 533.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 543.2 Pattern selection strategies for Frobenius-norm minimization

methods in electromagnetism . . . . . . . . . . . . . . . . . . 563.2.1 Algebraic strategy . . . . . . . . . . . . . . . . . . . . 563.2.2 Topological strategy . . . . . . . . . . . . . . . . . . . 583.2.3 Geometric strategy . . . . . . . . . . . . . . . . . . . . 603.2.4 Numerical experiments . . . . . . . . . . . . . . . . . . 61

3.3 Strategies for the coefficient matrix . . . . . . . . . . . . . . . 653.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 683.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 73

ix

x

4 Symmetric Frobenius-norm minimization preconditioners inelectromagnetism 774.1 Comparison with standard preconditioners . . . . . . . . . . . 774.2 Symmetrization strategies for Frobenius-norm minimization

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 88

5 Combining fast multipole techniques and approximateinverse preconditioners for large parallel electromagneticscalculations. 915.1 The fast multipole method . . . . . . . . . . . . . . . . . . . . 925.2 Implementation of the Frobenius-norm minimization

preconditioner in the fast multipole framework . . . . . . . . 945.3 Numerical scalability of the preconditioner . . . . . . . . . . . 965.4 Improving the preconditioner robustness using embedded

iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 108

6 Spectral two-level preconditioner 1116.1 Introduction and motivation . . . . . . . . . . . . . . . . . . . 1116.2 Two-level preconditioner via low-rank spectral updates . . . . 114

6.2.1 Additive formulation . . . . . . . . . . . . . . . . . . . 1156.2.2 Numerical experiments . . . . . . . . . . . . . . . . . . 1186.2.3 Symmetric formulation . . . . . . . . . . . . . . . . . . 136

6.3 Multiplicative formulation of low-rank spectral updates . . . 1396.3.1 Numerical experiments . . . . . . . . . . . . . . . . . . 140

6.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 143

7 Conclusions and perspectives 145

A Numerical results with the two-level spectralpreconditioner 153A.1 Effect of the low-rank updates on the GMRES convergence . 154A.2 Experiments with the operator WH = V H

ε M1 . . . . . . . . . 164A.3 Cost of the eigencomputation . . . . . . . . . . . . . . . . . . 174A.4 Sensitivity of the preconditioner to the accuracy of the

eigencomputation . . . . . . . . . . . . . . . . . . . . . . . . . 179A.5 Experiments with a poor preconditioner M1 . . . . . . . . . . 189A.6 Numerical results for the symmetric formulation . . . . . . . 204A.7 Numerical results for the multiplicative formulation . . . . . . 216

List of Tables

2.1.1 Number of matrix-vector products needed by someunpreconditioned Krylov solvers to reduce the residual by afactor of 10−5. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Number of iterations using both symmetric and unsymmetricpreconditioned Krylov methods to reduce the normwisebackward error by 10−5 on Example 1. The symbol ’-’ meansthat convergence was not obtained after 500 iterations. Thesymbol ’*’ means that the method is not applicable. . . . . . 20

2.2.3 Number of iterations required by different Krylov solverspreconditioned by SSOR to reduce the residual by 10−5. Thesymbol ’-’ means that convergence was not obtained after 500iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.4 Number of iterations, varying the sparsity level of A and thelevel of fill-in on Example 1. . . . . . . . . . . . . . . . . . . 25





2.2.9 Number of SQMR iterations, varying the shift parameter forvarious level of fill-in in IC. . . . . . . . . . . . . . . . . . . 34

2.2.10 Number of iterations required by different Krylov solverspreconditioned by AINV to reduce the residual by 10−5. Thesymbol ’-’ means that convergence was not obtained after 500iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2.11 Number of iterations required by different Krylov solverspreconditioned by AINV to reduce the residual by 10−5. Thepreconditioner is computed using the dense coefficient matrix.The symbol ’-’ means that convergence was not obtained after500 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xi

xii

2.2.12 Number of iterations required by different Krylov solverspreconditioned by SPAI to reduce the residual by 10−5. Thesymbol ’-’ means that convergence was not obtained after 500iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.2.13 Number of iterations required by different Krylov solverspreconditioned by SLU to reduce the residual by 10−5. Thesymbol ’-’ means that convergence was not obtained after 500iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 Number of iterations using the preconditioners based ondense A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.3.2 Number of iterations for GMRES(50) preconditioned withdifferent values for the density of M using the same patternfor A and larger patterns. A geometric approach is adoptedto construct the patterns. The test problem is Example 1.This is representative of the general behaviour observed. . . . 67

3.4.3 Number of iterations to solve the set of test problems. . . . . 713.4.4 CPU time to compute the preconditioners. . . . . . . . . . . . 713.5.5 Number of iterations to solve the set of test models by

using a multiple density geometric strategy to construct thepreconditioner. The pattern imposed on M is twice as denseas that imposed on A. . . . . . . . . . . . . . . . . . . . . . . 74

3.5.6 Number of iterations to solve the set of test models by usinga topological strategy to sparsify A and a geometric strategyfor the preconditioner. The pattern imposed on M is twiceas dense as that imposed on A. . . . . . . . . . . . . . . . . . 75

4.1.1 Number of iterations with some standard preconditionerscomputed using sparse A (algebraic). . . . . . . . . . . . . . . 80

4.2.2 Number of iterations on the test examples using the samepattern for the preconditioners. . . . . . . . . . . . . . . . . 83

4.2.3 Number of iterations for MSym−Frob combined with SQMRusing three times more non-zero in A than in the preconditioner. 83

4.2.4 Number of iterations of SQMR with MSym−Frob withdifferent values for the density of M , using the same patternfor A and larger patterns. The test problem is Example 1. . . 84

4.2.5 Number of iterations of SQMR with MAver−Frob withdifferent values for the density of M , using the same patternfor A and larger patterns. The test problem is Example 1. . . 84

4.2.6 Number of iterations of SQMR with MSym−Frob withdifferent orderings. . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.7 Number of iterations on the test examples using the samepattern for the preconditioners. An algebraic pattern is usedto sparsify A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

xiii

4.2.8 Number of iterations MSym−Frob combined with SQMR usingthree times more non-zero in A than in the preconditioner.An algebraic pattern is used to sparsify A. . . . . . . . . . . . 87

4.2.9 Number of iterations of SQMR with MSym−Frob withdifferent values for the density of M , using the same patternfor A and larger patterns. A geometric approach is adoptedto construct the pattern for the preconditioner and analgebraic approach is adopted to construct the pattern forthe coefficient matrix. The test problem is Example 1. . . . . 87

4.2.10 Number of iterations of SQMR with MAver−Frob withdifferent values for the density of M , using the same patternfor A and larger patterns. A geometric approach is adoptedto construct the pattern for the preconditioner and analgebraic approach is adopted to construct the pattern forthe coefficient matrix. The test problem is Example 1. . . . . 88

4.2.11 Number of iterations of SQMR with MSym−Frob withdifferent ordering. An algebraic pattern is used to sparsifyA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.3.1 Total number of matrix-vector products required to convergeon a sphere on problems of increasing size - tolerance = 10−2.The size of the leaf-boxes in the oct-tree associated with thepreconditioner is 0.125 wavelengths. . . . . . . . . . . . . . . 98

5.3.2 Elapsed time required to build the preconditioner andby GMRES(30) to converge on a sphere on problems ofincreasing size on eight processors on a Compaq Alpha server- tolerance = 10−2. . . . . . . . . . . . . . . . . . . . . . . . . 98

5.3.3 Total number of matrix-vector products required to convergeon an aircraft on problems of increasing size - tolerance =2 · 10−2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.3.4 Elapsed time required to build the preconditioner andby GMRES(30) to converge on an aircraft on problems ofincreasing size on eight procs on a Compaq Alpha server -tolerance = 2 · 10−2. . . . . . . . . . . . . . . . . . . . . . . . 99

5.3.5 Elapsed time to build the preconditioner, elapsed time tosolve the problem and total number of matrix-vector productsusing GMRES(30) on an aircraft with 213084 unknowns -tolerance = 2 · 10−2 - eight processors Compaq, varying theparameters controlling the density of the preconditioner. Thesymbol ’–’ means stagnation after 1000 iterations. . . . . . . . 102

5.3.6 Tests on the parallel scalability of the code relative to theconstruction and application of the preconditioner and to thematrix-vector product operation on problems of increasingsize. The test example is the Airbus aircraft. . . . . . . . . . 103

xiv

5.4.7 Global elapsed time and total number of matrix-vectorproducts required to converge on a sphere with 367500points varying the size of the restart parameters and themaximum number of inner GMRES iterations per FGMRESpreconditioning step - tolerance = 10−2 - eight processorsCompaq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.4.8 Global elapsed time and total number of matrix-vector products required to converge on an aircraft with213084 unknowns varying the size of the restart parametersand the maximum number of inner GMRES iterations perFGMRES preconditioning step - tolerance = 2 · 10−2 - eightprocessors Compaq. . . . . . . . . . . . . . . . . . . . . . . . 105

5.4.9 Total number of matrix-vector products required to convergeon a sphere on problems of increasing size - tolerance = 10−2. 107

5.4.10 Total number of matrix-vector products required to convergeon an aircraft on problems of increasing size - tolerance =2 · 10−2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.1 Effect of shifting the eigenvalues nearest zero on theconvergence of GMRES(10) for Example 2. We show themagnitude of successively shifted eigenvalues and the numberof iterations required when these eigenvalues are shifted. Atolerance of 10−8 is required in the iterative solution. . . . . . 120

6.2.2 Effect of shifting the eigenvalues nearest zero on theconvergence of GMRES(10) for Example 5. We show themagnitude of successively shifted eigenvalues and the numberof iterations required when these eigenvalues are shifted. Atolerance of 10−8 is required in the iterative solution. . . . . . 121

6.2.3 Number of iterations required by GMRES(10) preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for increasing size of the coarse space on Example 1.Different choices are considered for the operator WH . . . . . 125



xv



6.2.8 Number of matrix-vector products required by the IRAMalgorithm to compute approximate eigenvalues nearest zeroand the corresponding right eigenvectors. . . . . . . . . . . . 130

6.2.9 Number of amortization vectors required by the IRAMalgorithm to compute approximate eigenvalues nearest zeroand the corresponding right eigenvectors. The computationof the amortization vectors is relative to GMRES(10) and atolerance of 10−5. . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.2.10 Number of iterations required by GMRES(10)preconditioned by a Frobenius-norm minimization methodupdated with spectral corrections to reduce the normwisebackward error by 10−8 for increasing size of the coarse space.The formulation of Theorem 2 with the choice WH = V H

ε M1

is used for the low-rank updates. The computation of Ritzpairs is carried out at machine precision. . . . . . . . . . . . . 132

A.1.1 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for increasing size of the coarse space. . . . . . . . . 154




xvi







A.2.11Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward error by10−8 for increasing size of the coarse space. The formulationof Theorem 2 with the choice WH = V H

ε M1 is used for thelow-rank updates. . . . . . . . . . . . . . . . . . . . . . . . . . 164





xvii













A.2.20 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward error by10−5 for increasing size of the coarse space. The formulationof Theorem 2 with the choice WH = V H


xviii

A.3.21 Number of matrix-vector products, CPU time andamortization vectors required by the IRAM algorithm tocompute approximate eigenvalues nearest zero and thecorresponding eigenvectors. The computation of theamortization vectors is relative to GMRES(10) and atolerance of 10−5. . . . . . . . . . . . . . . . . . . . . . . . . . 174





A.4.26Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the residual by 10−8 forincreasing size of the coarse space. The formulation ofTheorem 2 with the choice WH = V H

ε M1 is used for thelow-rank updates. The computation of Ritz pairs is carriedout at machine precision. . . . . . . . . . . . . . . . . . . . . 179



xix













xx






ε M1 is used for thelow-rank updates. The same nonzero sructure is imposed onA and M1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189


ε M1 is used for thelow-rank updates. The same nonzero structure is imposed onA and M1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190





xxi










ε M1 is used for thelow-rank updates. The same nonzero sructure is imposed onA and M1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197



xxii






A.6.51 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for increasing size of the coarse space. The symmetricformulation of Theorem 2 with the choice W = Vε is used forthe low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 204


xxiii








xxiv


A.6.61 Number of iterations required by SQMR preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for increasing size of the coarse space. The symmetricformulation of Theorem 2 with the choice W = Vε is used forthe low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 214

A.6.62 Number of iterations required by SQMR preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−5 for increasing size of the coarse space. The symmetricformulation of Theorem 2 with the choice W = Vε is used forthe low-rank updates. . . . . . . . . . . . . . . . . . . . . . . 215

A.7.63 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for increasing size of the coarse space. Thepreconditioner is updated in multiplicative form. . . . . . . . 216





xxv






A.7.73 Number of iterations required by SQMR preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward error by10−8 for increasing size of the coarse space. The symmetricformulation of Theorem 2 with the choice W = Vε is usedfor the low-rank updates. The preconditioner is updated inmultiplicative form. . . . . . . . . . . . . . . . . . . . . . . . . 226

A.7.74 Number of iterations required by SQMR preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward error by10−5 for increasing size of the coarse space. The symmetricformulation of Theorem 2 with the choice W = Vε is usedfor the low-rank updates. The preconditioner is updated inmultiplicative form. . . . . . . . . . . . . . . . . . . . . . . . 227

List of Figures

1.3.1 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 8

2.1.1 Meshes associated with test examples. . . . . . . . . . . . . . 302.1.2 Eigenvalue distribution in the complex plane of the coefficient

matrix of Example 3. . . . . . . . . . . . . . . . . . . . . . . . 312.2.3 Pattern structure of the large entries of A. The test problem

is Example 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.4 Nonzero pattern for A when the smallest entries are

discarded. The test problem is Example 5. . . . . . . . . . . . 322.2.5 Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.6 Sensitivity of SQMR convergence to the SSOR parameter ω

for Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.7 Incomplete factorization algorithm - M = LDLT . . . . . . . 332.2.8 The spectrum of the matrix preconditioned with IC(1), the

condition number of L, and the number of iterations withSQMR for various values of the shift parameter τ . The testproblem is Example 1 and the density of A is around 3%. . . 35

2.2.9 The eigenvalue distribution on the square [-1, 1] of the matrixpreconditioned with IC(1), the condition number of L, andthe number of iterations with SQMR for various values of theshift parameter τ . The test problem is Example 1 and thedensity of A is around 3%. . . . . . . . . . . . . . . . . . . . . 36

2.2.10The eigenvalue distribution on the square [-0.3, 0.3] of thematrix preconditioned with IC(1), the condition number ofL, and the number of iterations with SQMR for various valuesof the shift parameter τ . The test problem is Example 1 andthe density of A is around 3%. . . . . . . . . . . . . . . . . . 37

2.2.11The biconjugation algorithm - M = ZD−1ZT . . . . . . . . . 392.2.12Sparsity patterns of the inverse of A (on the left) and of the

inverse of its lower triangular factor (on the right), where allthe entries whose relative magnitude is smaller than 5.0×10−2

are dropped. The test problem, representative of the generaltrend, is a small sphere. . . . . . . . . . . . . . . . . . . . . . 44

xxvii

xxviii

2.2.13 Histograms of the magnitude of the entries of the firstcolumn of A−1 and its lower triangular factor. A similarbehaviour has been observed for all the other columns. Thetest problem, representative of the general trend, is a smallsphere. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.1 Pattern structure of A−1. The test problem is Example 5. . . 573.2.2 Example of discretized mesh. . . . . . . . . . . . . . . . . . . 593.2.3 Topological neighbours of a DOF in the mesh. . . . . . . . . . 593.2.4 Topological localization in the mesh for the large entries of

A. The test problem is Example 1 and is representative ofthe general behaviour. . . . . . . . . . . . . . . . . . . . . . . 60

3.2.5 Topological localization in the mesh for the large entries ofA−1. The test problem is Example 1 and is representative ofthe general behaviour. . . . . . . . . . . . . . . . . . . . . . . 61

3.2.6 Evolution of the density of the pattern computed forincreasing number of levels. The test problem is Example 1.This is representative of the general behaviour. . . . . . . . . 62

3.2.7 Geometric localization in the mesh for the large entries of A.The test problem is Example 1. This is representative of thegeneral behaviour. . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.8 Geometric localization in the mesh for the large entries ofA−1. The test problem is Example 1. This is representativeof the general behaviour. . . . . . . . . . . . . . . . . . . . . . 64

3.2.9 Evolution of the density of the pattern computed for largergeometric neighbourhoods. The test problem is Example 1.This is representative of the general behaviour. . . . . . . . . 64

3.2.10Mesh of Example 2. . . . . . . . . . . . . . . . . . . . . . . . 663.3.11Nonzero pattern for A−1 when the smallest entries are

discarded. The test problem is Example 5. . . . . . . . . . . . 673.3.12Sparsity pattern of the inverse of sparse A associated with

Example 1. The pattern has been sparsified with the samevalue of the threshold used for the sparsification of displayedin Figure 3.3.11. . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3.13CPU time for the construction of the preconditioner usinga different number of nonzeros in the patterns for A and M .The test problem is Example 1. This is representative of theother examples. . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.4.14 Eigenvalue distribution for the coefficient matrixpreconditioned by using a single density strategy onExample 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.4.15 Eigenvalue distribution for the coefficient matrixpreconditioned by using a multiple density strategy onExample 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

xxix

5.1.1 Interactions in the one-level FMM. For each leaf-box,the interactions with the gray neighbouring leaf-boxes arecomputed directly. The contribution of far away cubes arecomputed approximately. The multipole expansions of faraway boxes are translated to local expansions for the leaf-box;these contributions are summed together and the total fieldinduced by far away cubes is evaluated from local expansions. 94

5.1.2 The oct-tree in the FMM algorithm. The maximum numberof children is eight. The actual number corresponds to thesubset of eight that intersect the object (courtesy of G.Sylvand, INRIA CERMICS). . . . . . . . . . . . . . . . . . . 95

5.1.3 Interactions in the multilevel FMM. The interactions for thegray boxes are computed directly. We denote by dashedlines the interaction list for the observation box, that consistsof those cubes that are not neighbours of the cube itselfbut whose parent is a neighbour of the cube’s parent. Theinteractions of the cubes in the list are computed using theFMM. All the other interactions are computed hierarchicallyon a coarser level, denoted by solid lines. . . . . . . . . . . . . 96

5.3.4 Mesh associated with the Airbus aircraft (courtesy of EADS).The surface is discretized by 15784 triangles. . . . . . . . . . 97

5.3.5 The RCS curve for an Airbus aircraft discretized with 200000unknowns. The problem is formulated using the EFIEformulation and a tolerance of 2·10−2 in the iterative solution.The quantity reported on the ordinate axis indicates the valueof the energy radiated back at different incidence angles. . . . 101

5.3.6 The RCS curve for an Airbus aircraft discretized with 200000unknowns. The problem is formulated using the CFIEformulation and a tolerance of ·10−6 in the iterative solution.The quantity reported on the ordinate axis indicates the valueof the energy radiated back at different incidence angles. . . . 101

5.3.7 Effect of the restart parameter on GMRES stagnation on anaircraft with 94704 unknowns. . . . . . . . . . . . . . . . . . . 102

5.4.8 Inner-outer solution schemes in the FMM context. Sketch ofthe algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.4.9 Convergence history of restarted GMRES for different valuesof restart on an aircraft with 94704 unknowns. . . . . . . . . 106

5.4.10Effect of the restart parameter on FGMRES stagnation onan aircraft with 94704 unknowns using GMRES(20) as innersolver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.1 Eigenvalue distribution for the coefficient matrixpreconditioned by the Frobenius-norm minimization methodon Example 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 115

xxx

6.2.2 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 and 10−5 for increasing size of the coarse space onExample 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119





6.2.7 Number of iterations required by GMRES preconditionedby a Frobenius-norm minimization method updated withspectral corrections to reduce the normwise backward errorby 10−8 for three choices of restart and increasing size of thecoarse space on Example 1. . . . . . . . . . . . . . . . . . . . 123



6.2.10 Eigenvalue distribution for the coefficient matrixpreconditioned by a Frobenius-norm minimization methodon Example 2. The same sparsity pattern is used for A andfor the preconditioner. . . . . . . . . . . . . . . . . . . . . . . 133

xxxi

6.2.11 Convergence of GMRES preconditioned by a Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 and 10−5

for increasing size of the coarse space on Example 1. Theformulation of Theorem 2 with the choice WH = V H

ε M1 isused for the low-rank updates. The same nonzero structureis used for A and M1. . . . . . . . . . . . . . . . . . . . . . . 133










6.2.15Number of iterations required by SQMR preconditioned by aFrobenius-norm minimization method updated with spectralcorrections to reduce the normwise backward error by 10−5

for increasing size of the coarse space on Example 1. Thesymmetric formulation of Theorem 2 with the choice W = Vε

is used for the low-rank updates. . . . . . . . . . . . . . . . . 136



is used for the low-rank updates. . . . . . . . . . . . . . . . . 137

xxxii



is used for the low-rank updates. . . . . . . . . . . . . . . . . 1376.2.18Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectralcorrections to reduce the normwise backward error by 10−5


is used for the low-rank updates. . . . . . . . . . . . . . . . . 1386.2.19Number of iterations required by SQMR preconditioned by a

Frobenius-norm minimization method updated with spectralcorrections to reduce the normwise backward error by 10−5


is used for the low-rank updates. . . . . . . . . . . . . . . . . 1386.3.20 Convergence of GMRES preconditioned by a Frobenius-

norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 and 10−5

for increasing numberof corrections on Example 1. Thesymmetric formulation of Theorem 2 with the choice W = Vε

is used for the low-rank updates. The preconditioner isupdated in multiplicative form. . . . . . . . . . . . . . . . . . 141







Chapter 1

Introduction

This thesis considers the problem of designing effective preconditioningstrategies for the iterative solution of boundary integral equations inelectromagnetism. An accurate numerical solution of these problemsis required in the simulation of many industrial processes, such as theprediction of the Radar Cross Section (RCS) of arbitrarily shaped 3Dobjects like aircrafts, the analysis of electromagnetic compatibility ofelectrical devices with their environment, and many others. In the last20 years, owing to the impressive development in computer technologyand to the introduction of fast methods which require less computationalcost and memory resources, a rigorous numerical solution of many of theseapplications has become possible [29]. Nowadays challenging problems inan industrial setting demand a continuous reduction in the computationalcomplexity of the numerical methods employed; the aim of this research isto investigate the use of sparse linear algebra techniques (with particularemphasis on preconditioning) for the solution of dense linear systemsof equations arising from scattering problems expressed in an integralformulation.

In this chapter, we illustrate the motivation of our research, and wepresent the major topics discussed in the thesis. In Section 1.1, we describethe physical problem we are interested in, and give some examples ofapplications. In Section 1.2, we formulate the mathematical problem and,in Section 1.3, we overview some of the principal approaches generally usedto solve scattering problems. Finally, in Section 1.4, we discuss direct anditerative solution strategies and introduce some issues relevant to the designof the preconditioner.

1

2 1. Introduction

1.1 The physical problem and applications

Electromagnetic scattering problems address the physical issue of detectingthe diffraction pattern of the electromagnetic radiation scattered from alarge and complex body when illuminated by an incident incoming wave.A good understanding of these phenomena is crucial to the design ofmany industrial devices like radars, antennae, computer microprocessors,optical fibre systems, cellular telephones, transistors, modems, and so on.Electronic circuits produce and are subject to electromagnetic interference,and ensuring reduced radiation and signal distortion have become twomajor issues in the design of modern electronic devices. The increase ofcurrents and frequencies in industrial simulations makes electromagneticcompatibility requirements more difficult to meet and demands an accurateanalysis previous to the design phase.

The study of electromagnetic scattering is required in radar applications,where a target is illuminated by incident radiation and the energy radiatedback to the radar is analysed to retrieve information on the target. In fact,the amount of the radiated energy depends on the radar cross-section ofthe target, on its shape, on the material of which it is composed, and onthe wavelength of the incident radiation. Radar measurements are vitalfor estimating surface currents in oceanography, for mapping precipitationareas and detecting wind direction and speed in meteorological andclimatic studies, as well as in the production of accurate weather forecasts,geophysical prospecting from remote sensing data, wireless communicationand bioelectromagnetics. In particular, the computation of radar cross-section is used to identify unknown targets as well as to design stealthtechnology.

Modern targets reduce their observability features by using newmaterials. Engineers design, develop and test absorbing materials whichcan control radiation, reduce signatures of military systems, preservecompatibility with other electromagnetic compatibility devices, isolaterecording studios and listening rooms. A good knowledge of theelectromagnetic properties of materials can be critical for economiccompetitiveness and technological advances in many industrial sectors. Allthese simulations can be very demanding in terms of computer resources;they require innovative algorithms and the use of high performancecomputers to afford a rigorous numerical solution.

1.2 The mathematical problem

The mathematical formulation of scattering problems relies on Maxwell’sequations, originally introduced by James Maxwell in 1864 in the article ADynamical Theory of the Electromagnetic Field [103] as 20 scalar equations.

1.2. The mathematical problem 3

Maxwell’s equations were reformulated in the 1880s as a set of four vectordifferential equations, describing the time and space evolution of the electricand the magnetic field around the scatterer. They are:

∇×H = J +∂D∂t

,

∇×E = −∂B∂t

,

∇ ·D = ρ,∇ ·B = 0.

(1.2.1)

The vector fields which appear in (1.2.1) are the electric field E(x,t), themagnetic field H(x,t), the magnetic flux density B(x,t) and the electric fluxdensity D(x,t). Equations (1.2.1) involve also the current density J(x,t) andthe charge density ρ(x, t). Given a vector field A represented in Cartesiancoordinates in the form A(x, y, z) = Ax(x, y, z)i+Ay(x, y, z)j+Az(x, y, z)k,the components of the curl operator ∇×A are

(∇×A)x =∂Az

∂y− ∂Ay

∂z,

(∇×A)y =∂Ax

∂z− ∂Az

∂x,

(∇×A)z =∂Ay

∂x− ∂Ax

∂y.

The divergence operator ∇ ·A in Cartesian coordinates is

∇ ·A =∂Ax

∂x+

∂Ay

∂y+

∂Az

∂z.

The continuity equation, which expresses the conservation of charge,relates the quantities J and ρ

∂ρ

∂t+∇ · J = 0.

In an isotropic conductor the current density is related to the electric fieldby Ohm’s law:

J = σE,

where σ(x) is called the electric conductivity. If σ is nonzero, the medium iscalled a conductor, whereas if σ = 0 the medium is referred to as a dielectric.Relations also exist between D and E, B and H, and are determined by thepolarization and magnetization properties of the medium containing thescatterer; in a linear isotropic medium we have

D = εE, B = µH,

4 1. Introduction

where the functions ε(x) and µ(x) are the electric permittivity and themagnetic permeability, respectively. In a vacuum D = E, and B = H.This equality can be assumed valid, up to some approximation, when themedium is the air. In this case, Maxwell’s equations can be simplified andread:

∇×H = J +∂E∂t

,

∇×E = −∂H∂t

,

∇ ·E = ρ,∇ ·H = 0.

(1.2.2)

Boundary conditions are associated with system (1.2.2) to describedifferent physical situations. For scattering from perfect conductors, whichrepresents an important model problem in industrial simulations, the electricfield vanishes inside the object and the total tangential electric field on thesurface of the scatterer is zero. Absorbing radiation conditions at infinityare imposed, like the Silver-Muller radiation condition [25]

limr→∞ (Hs × x− rEs) = 0

uniformly in all directions x = x/|x|, where r = |x| and Hs and Es are thescattered part of the fields.

A further simplification comes when Maxwell’s equations are formulatedin the frequency domain rather than in the time domain. Since the sum oftwo solutions is still a solution, Fourier transformations can be introducedto remove time-dependency from system (1.2.2), and to write it in theform of a set of several time-independent systems, each corresponding toone fixed value of frequency. All the quantities in (1.2.2) are assumed tohave harmonic behaviour in time, that is they can be written in the formA(x, t) = A(x)eiωt (ω is a constant) and their time dependency is completelydetermined by the amplitude and relative phase. For a dielectric body thenew system assumes the form:

∇×H = +iωE,∇×E = −iωH,∇ ·E = 0,∇ ·H = 0.

(1.2.3)

where now E = E(x) and H = H(x). Here ω = ck = 2πc/λ is referred toas the angular frequency, k as the wave number and λ as the wavelength ofthe electromagnetic wave. The constant c is the speed of light.

1.3. Numerical solution of Maxwell’s equations 5

1.3 Numerical solution of Maxwell’s equations

A popular solution approach eliminates the magnetic field H from (1.2.3)and obtains a vector Helmholtz equation with a divergence condition:

{∆E + k2E = 0,∇ ·E = 0.

(1.3.4)

Systems (1.3.4) are challenging to solve. An analytic solution can becomputed when the geometry of the scatterer is very regular, as in the caseof a sphere or a spheroid. More complicated boundaries require the use ofnumerical techniques.

Objects of interest in industrial applications generally have largedimension in terms of wavelength, and the computation of their scatteringcross section can be very demanding in terms of computer resources. Untilthe emergence of high-performance computers in the early eighties, thesolution was afforded by using approximate high frequency techniques suchas the shooting and bouncing ray method (SBR) [101]. Basically, ray-based asymptotic methods like SBR and uniform theory of diffraction arebased on the idea that EM scattering becomes a localized phenomenonas the size of the scatterer increases with respect to the wavelength. Inthe last 20 years, the impressive advance in computer technology and theintroduction of fast methods which have less computational and memoryrequirement, have made a rigorous numerical solution affordable for manypractical applications. Nowadays, computer scientists generally adopt twodistinct approaches for the numerical solution, based on either differentialor integral equation methods.

1.3.1 Differential equation methods

The first approach solves system (1.3.4) for the electric field surrounding thescatterer by differential equation methods. Classical discretization schemeslike the finite-element method (FEM) [125, 145]) or the finite-differencemethod (FDM) [99, 137] can be used to discretize the continuous modeland give rise to a sparse linear system of equations. The domain outsidethe object is truncated and an artificial boundary is introduced to simulatean infinite volume [20, 83, 85]. Absorbing boundary conditions do not alterthe sparsity structure in the matrix from the discretization but have tobe imposed at some distance from the scatterer. More accurate exteriorboundary conditions, based on integral equations, allow us to bring theexterior boundary of the simulation region closer to the surface of thescatterer and to limit the size of the linear system to solve [89, 104]. Asthey are based on integral equations, they result in a part of the matrixbeing dense in the final system which can increase the overall solution cost.

6 1. Introduction

The discretization of large 3D domains may suffer from grid dispersionerrors, which occur when a wave has a different phase velocity on thegrid compared to the exact solution [9, 90, 100]. Grid dispersion errorsaccumulate in space and, for 2D and 3D problems over large simulationregions, their effect can be troublesome, introducing spurious solutionsin the computation. The effect of grid dispersion errors can be reducedby using finer grids or higher-order accurate differential equation solvers,which substantially increase the problem size, or by coupling the differentialequation solver with an integral equation solver.

Because of the sparsity structure of the discretization matrix, differentialequation methods have become popular solution methods for EM problems.

1.3.2 Integral equation methods

An alternative class of methods is represented by integral equation solvers.Using the equivalence principle, system (1.3.4) can be recast in the form offour integral equations which relate the electric and magnetic fields E and Hto the equivalent electric and magnetic currents J and M on the surface ofthe object. Integral equation methods solve for the induced currents globally,whereas differential equation methods solve for the fields. The electric-fieldintegral equation (EFIE) expresses the electric field outside the object EE

in terms of the induced current J. In the case of harmonic time dependencyit reads

E(x) = −∫

Γ∇G(x, x′)ρ(x′)d3x′− ik

c

∫

ΓG(x, x′)J(x′)d3x′+EE(x) (1.3.5)

where EE is the electric field due to external sources, and G is the Green’sfunction for scattering problems:

G(x, x′) =e−ik|x−x′|

|x− x′| .

The EFIE provides a first-kind integral equation which is well knownto be ill-conditioned, but it is the only integral formulation that can beused for open targets. Another formulation, referred to as the magnetic-field integral equation (MFIE), expresses the magnetic field outside theobject in terms of the induced current and allows the calculation of themagnetic field outside the object. Both formulations suffer from interiorresonances, which can make the numerical solution more problematic atsome frequencies known as resonant frequencies. The problem of interiorresonances is particularly troubling for large objects. A possible remedy is tocombine linearly the EFIE and MFIE formulation. The resulting equation,known as the combined-field integral equation (CFIE), does not suffer frominternal resonance and is much better conditioned as it generally providesan integral equation of the second-kind, but can be used only for closed

1.3. Numerical solution of Maxwell’s equations 7

targets. Owing to these nice properties, the use of the CFIE formulation isconsidered mandatory for closed surfaces.

The resulting EFIE, MFIE and CFIE are converted into matrix equationsby the Method of Moments [86]. The unknown current J(x) on the surfaceof the object is expanded into a set of basis functions Bi, i = 1, 2, ..., N

J(x) =N∑

i=1

JiBi(x).

This expansion is introduced in (1.3.5), and the discretized equation isapplied to a set of test functions. A linear system of equations is finallyobtained, whose unknowns are the coefficients of the expansion. The entriesin the coefficient matrix are expressed in terms of surface integrals andassume the simplified form

AKL =∫ ∫

G(x, y)BK(x) ·BL(y)dL(y)dK(x). (1.3.6)

When m-point Gauss quadrature formulae are used to compute the surfaceintegrals in (1.3.6), the entries of the coefficient matrix have the form

AKL =m∑

i=1

m∑

j=1

ωiωjG(xKi , yLj )BK(xKi) ·BL(yLj ).

The resulting linear system is dense and complex, unsymmetric in the caseof MFIE and CFIE, symmetric but non-Hermitian in the case of the EFIEformulation.

For homogeneous or layered homogeneous dielectric bodies, integralequations are discretized on the surface of the object or at the discontinuousinterfaces between two different materials. Thus the number of unknowns isgenerally much smaller when compared to the discretization of large 3Dspaces by finite-difference or finite-element methods. However, a globalcoupling of the induced currents in the problem results in dense matrices.The cost of the solution associated with these dense matrices has for a longtime precluded the popularity of integral solution methods in EM. In recentyears, the application in the context of the study of radar targets of differentmaterials and the availability of larger computer resources have motivatedan increasing interest towards integral methods.

Throughout this thesis, we focus on preconditioning strategies for theEFIE formulation of scattering problems. In the integral equation contextthat we consider, the problems are discretized by the Method of Momentsusing the Rao-Wilton-Glisson (RWG) basis functions [116]. The surfaceof the object is modelled by a triangular faceted mesh (see Figure 1.3.1),and each RWG basis is assigned to one interior edge in the mesh. Eachunknown in the problem represents the vectorial flux across each edge in the

8 1. Introduction

triangular mesh. The total number of unknowns is given by the number ofinterior edges, which is about one and a half times the number of triangularfacets. In order to have a correct approximation to the oscillating solutionof the Maxwell’s equations, physical constraints impose that the averageedge length a has to be between 0.1λ and 0.2λ, where λ is the wavelengthof the incoming wave [11]. Two factors mainly affect the dimension N ofthe linear system to solve, namely the total surface area and the frequencyof the problem. For a given target the size of the system is proportionalto the square of the frequency, and the memory cost for the storage ofthe N2 complex numbers of the full discretization matrix is proportionalto the fourth power of the frequency. This cost increases drastically whenfine discretization is required, as in the case for rough geometries, and canmake the numerical solution of medium size problems unaffordable even onmodern computers. Nowadays a typical electromagnetic problem in industrycan have hundred of thousands or a few million unknowns.

Figure 1.3.1: Example of discretized mesh.

1.4 Direct versus iterative solution methods

Direct methods are often the method of choice for the solution ofthese systems in an industrial environment because they are reliable andpredictable both in terms of accuracy and cost. Dense linear algebrapackages such as LAPACK [5] provide reliable implementations of LUfactorization attaining good performance on modern computer architectures.In particular, they use Level 3 BLAS [51, 52] for block operations which

1.4 Direct versus iterative solution methods 9

enable us to exploit data locality in the cache memory. Except when thegeometries are very irregular, the coefficient matrices of the discretizedproblem are not very ill-conditioned, and direct methods compute fairlyaccurate solutions. The factorization can be performed once and then isreused to compute a solution for all excitations. In industrial simulations,objects are illuminated at several, slightly different incidence directions,and hundred of thousands of systems have often to be solved for the sameapplication, all having the same coefficient matrix and a different right-handside.

For the solution of large-scale problems, direct methods becomeimpractical even on large parallel platforms because they require storageof N2 single or double precision complex entries of the coefficient matrixand O(N3) floating-point operations to compute the factorization, where Ndenotes the size of the linear system. Some direct solvers with reducedcomputational complexity have been introduced for the case when thesolution is sought for blocks of right-hand sides, like the EADS out-of-coreparallel solver [1], the Nested Equivalence Principle Algorithm (NEPAL) [30,31] and the Recursive Aggregate T-Matrix Algorithm (RATMA) [31, 32], butthe computational cost remains a bottleneck for large-scale applications.Although, in the last twenty years, computer technology has gone fromflops to Gigaflops, that is a speedup factor of 109, the size of the largestdense problems solved on current architectures increased by only a factor ofthree [56, 57].

1.4.1 A sparse approach for solving scattering problems

It can be argued that all large dense matrices hide some structure behindtheir N2 entries. The structure sometimes emerges naturally at thematrix level (Toeplitz, circulant, orthogonal matrices) and sometimes canbe identified from the origin of the problem. When the number of unknownsis large, the discretized problem reflects more closely the properties of thecontinuous problem, and the entries of the discretization matrix are farfrom arbitrary. Exploiting this structure can enable the use of sparse linearalgebra techniques and lead to a sensible reduction of the overall solutioncost. The use of iterative methods can be promising from this viewpointbecause they simply require a routine to compute matrix-vector productsand do not need the knowledge of all the entries of the coefficient matrix.Special properties of the problem can be profitably used to reduce thecomputational cost of this procedure. Under favourable conditions, iterativemethods improve the approximate solution at each step. When the requiredaccuracy is obtained, one can stop the iteration.

In the last decades, active research efforts have been devoted tounderstanding theoretical and numerical properties of modern iterativesolvers. Although they still cannot compete with direct solvers in terms of

10 1. Introduction

robustness, they have been successfully used in many contexts. In particular,it is now established that iterative solvers have to be used with some form ofpreconditioning to be effective on challenging problems, like those arising inindustry (see, for instance, [2, 41, 60, 146]). Provided we have fast matrix-vector multiplications and robust preconditioners the iterative solution viamodern Krylov solvers can be an alternative to direct methods.

There are active research efforts on fast methods [4, 82] to perform fastmatrix-vector products with O(N log N) computational complexity. Thesemethods, generally referred to as hierarchical methods, were introducedoriginally in the context of the study of particle simulations as a wayto reduce costs and enable the solution of large problems, or to demandmore accuracy in the computation [6, 8]. Hierarchical methods can beeffective on boundary element applications, and many research efforts havebeen successful in this direction, including strategies for parallel distributedmemory implementations [45, 46, 47, 79, 80].

In this thesis, we focus on the other key component of Krylov methodsin this context; that is, we study the design of robust preconditioningtechniques. The design of the preconditioner is generally very problem-dependent and can take great advantage of a good knowledge of theunderlying physical problem. General purpose preconditioners can fail onspecific classes of problems, and for some of them a good preconditioner isnot known yet. A preconditioner M is required to be a good approximationof A in some sense (or of A−1, depending on the context), to be easy tocompute and cheap to store and to apply. For electromagnetic scatteringproblems expressed in integral formulation, some special constraints inaddition to normal constraints are required. For large problems the useof fast methods is mandatory for the matrix-vector products. When fastmethods are used, the coefficient matrix is not completely stored in memoryand only some of the entries, corresponding to the near-field interactions, areexplicitely computed and available for the construction of the preconditioner.Hierarchical methods are often implemented in parallel, partitioning thedomain among different processors and the matrix-vector products arecomputed in a distributed manner, trying to meet the goal of both loadbalancing and reduced communications. Thus, parallelism is a relevantfactor to consider in the design of the preconditioner. Nowadays the typicalproblem size in electromagnetic industry is continually increasing, and theeffectiveness of preconditioned Krylov subspace solvers should be combinedwith the property of numerical scalability; that is, the numerical behavior ofthe preconditioner should not depend on the mesh size or on the frequencyof the problem. Finally, matrices arising from the discretization of integralequations can be highly indefinite and many standard preconditioners canexhibit surprisingly poor performance.

This manuscript is structured as follows. In Chapter 2, we establish theneed for preconditioning linear systems of equations which arise from the

1.4 Direct versus iterative solution methods 11

discretization of boundary integral equations in electromagnetism, and wetest and compare several standard preconditioners computed from a sparseapproximation of the dense coefficient matrix. We study their numericalbehaviour on a set of model problems arising from both academic and fromindustrial applications, and gain some insight on potential causes of failure.In Chapter 3, we focus our analysis on sparse approximate inverse methodsand we propose some efficient static nonzero pattern selection strategies forthe construction of a robust Frobenius-norm minimization preconditioner inelectromagnetism. We introduce suitable strategies to identify the relevantentries to consider in the original matrix A, as well as an appropriatesparsity structure for the approximate inverse. In Chapter 4, we illustratethe numerical and computational efficiency of the proposed preconditioneron a set of model problems, and we complete the study consideringtwo symmetric preconditioners based on Frobenius-norm minimization.In Chapter 5, we consider the implementation of the Frobenius-normminimization preconditioner within the code that implements the FastMultipole Method (FMM). We combine the sparse approximate inversepreconditioner with fast multipole techniques for the solution of hugeelectromagnetic problems. We study the numerical and parallel scalabilityof the implementation and we investigate the numerical behaviour of inner-outer iterative solution schemes implemented in a multipole context withdifferent levels of accuracy for the matrix-vector products in the inner andouter loops. In Chapter 6, we introduce an algebraic multilevel strategybased on low-rank updates for the preconditioner computed by using spectralinformation of the preconditioned matrix. We illustrate the computationaland numerical efficiency of the algorithm on a set of model problemsthat is representative of real electromagnetic calculation. We finally drawsome conclusions arising from the work and address perspectives for futureresearch.

12 1. Introduction

Chapter 2

Iterative solution viapreconditioned Krylovsolvers of dense systems inelectromagnetism

In this chapter we establish the need for preconditioning linear systems ofequations which arise from the discretization of boundary integral equationsin electromagnetism. In Section 2.1, we illustrate the numerical behaviourof iterative Krylov solvers on a set of model problems arising both fromindustrial and from academic applications. The numerical results suggestthe need for preconditioning to effectively reduce the number of iterationsrequired to obtain convergence. In Section 2.2, we introduce the idea ofpreconditioning based on sparsification strategies, and we test and compareseveral standard preconditioners computed from a sparse approximation ofthe dense coefficient matrix. We study their numerical behaviour on modelproblems and gain some insight on potential causes of failure.

2.1 Introduction and motivation

In this section we study the numerical behaviour of several iterativesolvers for the solution of linear systems of the form

Ax = b (2.1.1)

where the coefficient matrix A arises from the discretization of boundaryintegral equations in electromagnetism. Among different integral

13

14 2. Iterative solution via preconditioned Krylov solvers ...

formulations here we focus on the EFIE formulation 1.3.5, because it is moregeneral and more difficult to solve. We use the following Krylov methods:

• restarted GMRES [123];

• Bi-CGSTAB [142] and Bi-CGSTAB(2) [129];

• symmetric [69], nonsymmetric [67] and transpose-free QMR [66];

• CGS [131].

As a set of model problems for the numerical experiments we considerthe following geometries, arising both from academic and from industrialapplications, that are representative of the general numerical behaviourobserved. For physical consistency we have set the frequency of the wave sothat there are about ten discretization points per wavelength [11].

Example 1: a cylinder with a hollow inside, a matrix of order n = 1080,see Figure 2.1.1(a);

Example 2: a cylinder with a break on the surface, a matrix of ordern = 1299, see Figure 2.1.1(b);

Example 3: a satellite, a matrix of order n = 1701, see Figure 2.1.1(c);

Example 4: a parallelopiped, a matrix of order n = 2016, seeFigure 2.1.1(d); and

Example 5: a sphere, a matrix of order n = 2430, see Figure 2.1.1(e).

The first three examples are considered because they can berepresentative of real industrial simulations. The geometries of Examples 4and 5 are very regular, and they are mainly introduced to study thenumerical behaviour of the proposed methods on smooth surfaces. In spiteof their small dimension, these problems are not easy to solve. Except fortwo of the model problems, the sphere and the parallelopiped, the otherproblems are tough because their geometries have open surfaces. Largerproblems will be examined in Chapter 5 when we consider the multipolemethod.

2.1. Introduction and motivation 15

(a) Example 1 (b) Example 2

(c) Example 3 (d) Example 4

(e) Example 5

Figure 2.1.1: Meshes associated with test examples.


Table 2.1.1 shows the number of matrix-vector products needed byeach of the solvers to reduce the residual by 10−5. This tolerance can beaccurate for engineering purposes, as it enables to localize fairly accuratelythe distribution of the currents on the surface of the object. In each case,we take as initial guess x0 = 0, and the right-hand side such that the exactsolution of the system is known. In the GMRES code [63] and the symmetricQMR code [62] (referred to as SQMR in the forthcoming tables), iterationsare stopped when, for the current approximation xm, the computed value of

‖rm‖2α‖xm‖2+β

satisfied a fixed tolerance. Here rm is the residual vector rm = b − Axm,and standard choices for constants α and β in backward error analysisare α = ‖A‖2 and β = ‖b‖2. In all our tests we use α = 0 andβ = ‖b‖2 = ‖r0‖2 because of initial guess. For CGS and Bi-CGSTAB, weuse the implementations provided by HSL 2000 [87] subroutines MI06 andMI03 respectively, suitably adapted to complex arithmetic. These routinesaccept the current approximation xm when

‖b−Axm‖ ≤ max(‖b−Ax0‖2 · ε1, ε2),

where ε1 and ε2 are user-defined tolerances. In our case we take ε1 asequal to the required accuracy, and ε2 = 0.0. For Bi-CGSTAB(2) weuse the implementation developed by D. Fokkema of the Bi-CGSTAB(l)algorithm, which introduces some enhancements to improve stability androbustness, as explained in [127] and [128]. The algorithm stops iterationswhen the relative residual norm ‖rn‖2/‖r0‖2 becomes smaller than a fixedtolerance. In the tests with nonsymmetric QMR (referred to as UQMRin the forthcoming tables) and TFQMR, we use, respectively, the ZUCPLand ZUTFX routines provided in QMRPACK [70]. In particular, ZUCPLimplements a double complex nonsymmetric QMR algorithm based on thecoupled two-term look-ahead Lanczos variant (see [68]). Both ZUCPLand ZUTFX stop iterations when the relative residual norm ‖rn‖2/‖r0‖2

becomes smaller than a fixed tolerance. Notice that, since x0 = 0, all thestopping criteria are equivalent, allowing fair comparison among all thosemethods. All the numerical experiments reported in this section correspondto runs on a Sun workstation in double complex arithmetic and Level 2BLAS operations are used to carry out dense matrix-vector products. Inconnection with GMRES, we test different values of the restart m, from 10up to 110. We recall that each iteration involves one matrix-vector productfor restarted GMRES and SQMR, two for Bi-CGSTAB and CGS, three forUQMR and four for TFQMR.


Example SizeGMRES(m) Bi -

CGStabm=10 m=30 m=50 m=80 m=110

1 1080 +1000 +1000 600 255 204 4452 1299 +1000 826 589 403 292 7173 1701 +1000 824 651 556 493 +10004 2016 426 232 195 160 149 3545 2430 +1000 356 238 148 127 303

Example SizeBi -

CGStab(2)SQMR UQMR TFQMR CGS

1 1080 320 149 693 700 3302 1299 404 186 +1000 996 4383 1701 668 345 +1000 +1000 4184 2016 228 92 514 470 2565 2430 284 98 511 434 284

Table 2.1.1: Number of matrix-vector products needed by someunpreconditioned Krylov solvers to reduce the residual by a factor of 10−5.

Except for SQMR, all the other solvers exhibit very slow convergenceon the first three examples which correspond to irregular geometries andare more difficult to solve. The last two examples are easier because thegeometries are very regular, however the iterative solution is still expensivein terms of number of matrix-vector products. These experiments reveal theremarkable robustness of SQMR that clearly outperforms non-symmetricsolvers on all the test cases, even GMRES for large restarts. The resultsalso reveal the good performance of Bi-CGSTAB(2) compared to thestandard Bi-CGSTAB method which generally requires at least one thirdmore matrix-vector products to converge. On the most difficult problems,slow convergence is essentially due to the bad spectral properties of thecoefficient matrix. Figure 2.1.2 plots the distribution of eigenvalues in thecomplex plane for Example 3; the eigenvalues are scattered from the left tothe right of the spectrum, many of them have large negative real part andno clustering appears. Such a distribution is not at all favourable for therapid convergence of Krylov solvers.

Krylov methods look for the solution of the system in the Krylov spaceKk(A, b) = span{b, Ab,A2b, ..., Ak−1b}. This is a good space from whichto construct approximate solutions for a nonsingular linear system becauseit is intimately related to A−1. The inverse of any nonsingular matrixA can be written in terms of powers of A with the help of the minimalpolynomial q(t) of A, which is the unique monic polynomial of minimum


−45 −40 −35 −30 −25 −20 −15 −10 −5 0 5−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Real axis

Imag

inar

y ax

is

Figure 2.1.2: Eigenvalue distribution in the complex plane of the coefficientmatrix of Example 3.

degree such that q(A) = 0. If the minimal polynomial of A has degree m,then the solution of Ax = b lies in the space Km(A, b). Consequently, thesmaller the degree of the minimal polynomial, the faster the expected rateof convergence of a Krylov method (see [88]). If preconditioning A by anonsingular matrix M causes the eigenvalues of M−1A to fall into a fewclusters, say t of them, whose diameters are small enough, then M−1Abehaves numerically like a matrix with t distinct eigenvalues. As a result,we would expect t iterations of a Krylov method to produce reasonablyaccurate approximations. It has been shown in [74, 122, 148] that inpractice, with the availability of a high quality preconditioner, the choice ofthe Krylov subspace accelerator is not so critical.

2.2 Preconditioning based on sparsificationstrategies

A preconditioner M should satisfy the following demands:

• M is a good approximation to A in some sense (sometimes to A−1,depending on the context);

• the construction and storage of M is not expensive;

2.2. Preconditioning based on sparsification strategies 19

• the system Mx = b is much easier to solve than the original one.

The transformed preconditioned system has the form M−1Ax = M−1b ifpreconditioning from the left, and AM−1y = b , with x = M−1y, whenpreconditioning from the right. For a preconditioner M given in the formM = M1M2, it is also possible to consider the two-sided preconditionedsystem M−1

1 AM−12 z = M−1

1 b, with x = M−12 z.

Most of the existing preconditioners can be divided into either implicit orexplicit form. A preconditioner is said to be of implicit form if its application,within each step of an iterative method, requires the solution of a linearsystem; it is implicitly defined by any nonsingular matrix M ≈ A. The mostimportant example of this class is represented by incomplete factorizationmethods, where M is implicitly defined by M = LU , L and U are generallytriangular matrices that approximate the exact L and U factors from astandard factorization of A according to some dropping strategy adoptedduring the factorization. It is well known that these methods are sensitive toindefiniteness in the coefficient matrix A and can lead to unstable triangularsolves and very poor preconditioners (see [34]). Another important drawbackof ILU -techniques is that they are not naturally suitable for a parallelimplementation since the sparse triangular solves can lead to a severedegradation of performance on vector and parallel machines.

Explicit preconditioning techniques try to mitigate such difficulties.They directly approximate A−1 as the product M of sparse matrices, so thatthe preconditioning operation reduces to forming one or more matrix-vectorproducts. Consequently the application of the preconditioner should beeasier to parallelize, with different strategies depending on the particulararchitecture. In addition, some of these techniques can also perform theconstruction phase in parallel. On certain indefinite problems with largenonsymmetric parts, these methods have provided better results thantechniques based on incomplete factorizations (see [35]), representing anefficient alternative to the solution of difficult applications. A comparisonof approximate inverse and ILU can be found in [76].

In the next sections, we study the numerical behaviour of severalstandard preconditioners both of implicit and of explicit form incombination with Krylov methods for the solution of systems (2.1.1).All the preconditioners are computed from a sparse approximation of thedense coefficient matrix. On general problems, this approach can causea severe deterioration of the quality of the preconditioner; in the BEMcontext, it is likely to be more effective since a very sparse matrix can retainthe most relevant contributions to the singular integrals. In Figure 2.2.3we depict the pattern structure of the large entries in the discretizationmatrix for Example 5, which is representative of the general trend. Largeto small entries are depicted in different colours, from red to green, yellow


and blue. The picture shows that, in the discretization matrix, only asmall set of entries generally have large magnitude. The largest entries arelocated on the main diagonal and only a few adjacent bands have entriesof high magnitude. Most of remaining entries generally have much smallermodulus. In Figure 2.2.4, we plot for the same example the matrix obtainedby scaling A = [aij ] so that maxi,j |aij | = 1, and discarding from A allentries less than ε = 0.05 in modulus. This matrix is 98.5% sparse. Thefigure emphasizes the presence of the strong coupling among neighbouringedges introduced in the geometrical domain by the Boundary ElementMethod, and suggests the possibility of extracting a sparsity pattern fromA by simply discarding elements of negligible magnitude, which correspondto weak contributions of coupling among distant nodes.

Figure 2.2.3: Pattern structure of the large entries of A. The test problemis Example 5.

The dropping operation is generally referred to as sparsification. Theidea of sparsifying dense matrices before computing the preconditioner wasintroduced by Kolotilina [93] in the context of sparse approximate inversemethods. Alleon et al. [2], Chen [28] and Vavasis [144] used this idea forthe preconditioning of dense systems from the discretization of boundaryintegral equations, and Tang and Wan [140] in the context of multigridmethods. Similar ideas are also exploited by Ruge and Stuben [118] in the


Figure 2.2.4: Nonzero pattern for A when the smallest entries are discarded.The test problem is Example 5.

context of algebraic multigrid methods. On sparse systems, sparsificationcan be helpful to identify the most relevant connections in the directproblem, especially when the coefficient matrix contains many small entriesor is fairly dense (see [33] and [91]).

Several heuristics can be used to sparsify A and to try and retain themain contributions to the singular integrals. Some approaches are thefollowing:

• find, in each column of A, the k entries of largest modulus, wherek ¿ n is a positive integer. The choice of the parameter k is generallyproblem-dependent. The resulting matrix will have exactly k·n entries;

• for each column of A, select the row indices of the k largest entriesin modulus and then, for each row index i corresponding to one ofthese entries, performing the same search on column i. These new rowindices will be added to the previous ones to form the nonzero patternfor the column. This heuristic, referred to as neighbours of neighbours,is described in detail in [36];

• the same approach as in the previous heuristic, but performing morethan one iteration, and halving the number of largest entries to belocated at each iteration in order to preserve sparsity. In practice, twoiterations are enough [2];


• scaling A such that its largest entry has magnitude equal to 1, andretaining in the pattern only the elements located in positions (i, j)such that ‖aij‖ > ε, where the threshold parameter ε ∈ (0, 1). Thisheuristic was proposed by Kolotilina in [93].

Combinations of these approaches can be also used. In the numericalexperiments the preconditioners considered are constructed from the sparsenear-field approximation of A, computed by using the first heuristic. We willrefer to this matrix as sparsified(A) and denote it as A. We symmetrize thepattern after computing it in order to preserve symmetry in A. We considerthe following methods implemented as right preconditioners :

• SSOR(ω), where ω is the relaxation parameter ;

• IC(k), the incomplete Cholesky factorization technique with k levelsof fill-in, i.e. taking for the factors a sparsity pattern based on positionand prescribed in advance;

• AINV , the approximate inverse method introduced in [16] that usesa dropping strategy based on values;

• SPAI, a Frobenius-norm minimization technique with the adaptivestrategy proposed by Gould and Scott [76] for the selection of thesparsity pattern for the preconditioner.

In order to illustrate the trend in the behaviour of these preconditioners,we first show in Table 2.2.2 the number of iterations required to compute thesolution on Example 1. All the preconditioners are computed using the samesparse approximation of the original matrix and all have roughly the samenumber of nonzeros entries. In the incomplete Cholesky factorization, noadditional level of fill-in was allowed in the factors; with AINV , we selecteda suitable dropping threshold (around 10−3) to obtain the same degree ofdensity as the other methods; and finally, with SPAI, we chose a priori, foreach column of M , the same fixed maximum number of nonzeros as in thecomputation of sparsified(A). In the SSOR method, we choose ω=1. InTable 2.2.2 we give the number of iterations for both GMRES and SQMRthat actually also corresponds to the number of matrix-vector products thatis the most time consuming part of the algorithms. We intend, in thefollowing sections, to understand the numerical behaviour of these methodson electromagnetics problems, identifying some potential causes of failure.

2.2.1 SSOR

The SSOR preconditioner is the most basic preconditioning method apartfrom a diagonal scaling. It is defined as


Example 1 - Density of A = 4% - Density of M = 4%Precond. GMRES(50) GMRES(110) GMRES(∞) SQMRNone – 204 139 142Jacobi 465 174 134 142SSOR 214 100 100 145IC(0) – – 159 –AINV – – – –SPAI 336 79 79 *

Table 2.2.2: Number of iterations using both symmetric and unsymmetricpreconditioned Krylov methods to reduce the normwise backward error by10−5 on Example 1. The symbol ’-’ means that convergence was not obtainedafter 500 iterations. The symbol ’*’ means that the method is not applicable.

M = (D + ωE)D−1(D + ωET )

where E is the strictly lower triangular part of A, and D is the diagonalmatrix whose nonzero entries are the diagonal entries of A. In the caseω = 1, D + E is the lower part of A, including the diagonal, and D + ET isthe upper part of A. We recall that A is symmetric, because A is symmetricand we use a symmetric pattern for the sparsification.

In Table 2.2.3 we show the number of iterations required by differentKrylov solvers preconditioned by SSOR to reduce the residual by a factorof 10−5. For those experiments we use ω = 1 to compute the preconditionerand we consider increasing values of density for the matrix A. Althoughvery cheap to compute, SSOR is not very robust. Increasing the density ofthe sparse approximation of A does not help to improve its performance,and indeed on some problems it behaves like a diagonal scaling (ω = 0). InFigures 2.2.5 and Figures 2.2.6 we illustrate the sensitivity of the SQMRconvergence to the parameter ω for Examples 1 and 4. When SSOR is usedas a stationary iterative solver, the relaxation parameter ω is selected in theinterval [0,2]. When SSOR is used as a preconditioner, the choice of the ωparameter might be less constraining; thus we also show experiments withvalues a bit larger than 2.0.


Example 1

Density of A

GMRES(m)Bi -

CGStabUQMR SQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – 213 145 103 310 – 149 –4% – – 214 139 100 297 – 145 –6% – – 224 136 98 317 – 149 –8% – – 216 127 95 307 – 149 –

10% – – 202 126 94 360 – 151 –Example 2

Density of A

GMRES(m)Bi -


m=10 m=30 m=50 m=80 m=110

2% – 478 269 184 146 – – 195 –4% – – 281 178 145 349 – 187 –6% – – 350 194 152 – – 186 –8% – – 381 205 156 – – 189 –

10% – – 385 200 157 428 – 193 –Example 3

Density of A

GMRES(m)Bi -


m=10 m=30 m=50 m=80 m=110

2% – – 411 314 245 – – 419 –4% – – 405 306 233 – – 420 –6% – – 406 306 231 486 – 412 –8% – – 405 303 228 498 – 421 –

10% – – 406 302 229 – – 326 –Example 4

Density of A

GMRES(m)Bi -


m=10 m=30 m=50 m=80 m=110

2% 371 192 138 116 95 193 379 85 3424% 457 206 145 119 95 221 387 85 4006% 464 208 148 119 97 224 399 85 3568% 445 214 152 121 97 263 389 85 392

10% 475 217 157 122 96 223 396 85 402Continued on next page


Continued from previous pageExample 5

Density of A

GMRES(m)Bi -


m=10 m=30 m=50 m=80 m=110

2% – 327 208 152 125 371 – 67 –4% – 436 272 192 160 471 – 67 –6% – – 333 217 184 – – 68 –8% – – 381 231 191 – – 68 –

10% – – 423 242 195 – – 73 –Table 2.2.3: Number of iterations required by different Krylovsolvers preconditioned by SSOR to reduce the residual by10−5. The symbol ’-’ means that convergence was notobtained after 500 iterations.

0 0.5 1 1.5 2 2.5140

145

150

155

160

165

Value of ω

SQ

MR

iter

atio

ns

Example 1 − Size = 1080 − Density of sparsified(A) = 6 %

Figure 2.2.5: Sensitivity of SQMR convergence to the SSOR parameter ωfor Example 1.

2.2.2 Incomplete Cholesky factorization

Incomplete factorization methods are one of the most natural ways toconstruct preconditioners of implicit type. In the general nonsymmetriccase, they start from a factorization method such as LU or Cholesky


0 0.5 1 1.5 2 2.580

85

90

95

100

105

Value of ω

SQ

MR

iter

atio

ns

Example 4 − Size = 2016 − Density of sparsified(A) = 6 %

Figure 2.2.6: Sensitivity of SQMR convergence to the SSOR parameter ωfor Example 4.

decomposition or even QR factorization that decompose the matrix into theproduct of triangular factors, and thus modify it to reduce the constructioncost. The basic idea is to keep the factors artificially sparse, for instanceby dropping some elements in prescribed nondiagonal positions during thestandard Gaussian elimination algorithm. It is well known that, even whenthe matrix is sparse, the triangular factors L and U and similarly the unitaryand the upper triangular factors Q and R can often be fairly dense. Thepreconditioning operation z = M−1y is computed by solving the linearsystem LUz = y, where L ≈ L and U ≈ U , that is performed in twodistinct steps:

1. solve Lw = y

2. solve Uz = w.

ILU preconditioners are amongst the most reliable in a general setting.Originally developed for sparse matrices, they can be applied also to densesystems, by extracting a sparsity pattern in advance, and performing theincomplete factorization on the sparsified matrix. This class has beenintensively studied, and successfully employed on a wide range of symmetricproblems, providing a good balance between computational costs andreduction of the number of iterations (see [27] and [55]). Well knowntheoretical results on the existence and stability of the factorization canbe proved for the class of M -matrices [105], and recent studies involve moregeneral symmetric matrices, both structured and unstructured.


In this section, we consider the incomplete Cholesky factorization anddenote it by IC. We assume that the standard IC factorization matrix Mof A is given in the following form

M = LDLT , (2.2.2)

whereD and L stand for, respectively, the diagonal matrix and the unit lowertriangular matrix whose entries are computed by means of the algorithmgiven in Figure 2.2.7. The set F of fill-in entries to be kept is given by

F = { (k, i) | lev(lk,i) ≤ ` } ,

where integer ` denotes a user specified maximal fill-in level. The levellev(lk,i) of the coefficient lk,i of L is defined by:

Initialization

lev(lk,i) =

0 if lk,i 6= 0 or k = i

∞ otherwiseFactorization

lev(lk,i) = min { lev(lk,i) , lev(li,j) + lev(lk,j) + 1 } .

The resulting preconditioner is usually denoted by IC(`). Alternativestrategies that dynamically discard fill-in entries are summarized in [122].

In Tables 2.2.4 to 2.2.8, we display the number of iterations usingan incomplete Cholesky factorization preconditioner on the five modelproblems. In this and in the forthcoming tables the symbol ’-’ means thatconvergence was not obtained after 500 iterations. We show results forincreasing values of the density for the sparse approximation of A as wellas various levels of fill-in. The general trend is that increasing the fill-ingenerally produces a much more robust preconditioner than IC(0) applied toa denser sparse approximation of the original matrix. Moreover, IC(`) with` ≥ 1 may deliver a good rate of convergence provided the coefficient matrixis not too sparse, as we get closer to LDLT . However, on indefinite problemsthe numerical behaviour of IC can be fairly chaotic. This can be observed inTable 2.2.8 for Example 5. The factorization of a very sparse approximation(up to 2%) of the coefficient matrix can be stable and deliver a good rate ofconvergence, especially if at least one level of fill-in is retained. For highervalues of density for the approximation of A, the factors may become veryill-conditioned and consequently the preconditioner is very poor. As shownin the tables, ill-conditioning of the factors is not related to ill-conditioningof the matrix A. This behaviour has been already observed on sparse realindefinite systems, see for instance [34].

As an attempt for a possible remedy, following [109, 110], we apply IC(`)to a perturbation of A by a complex diagonal matrix. More specifically, we


Compute D and L

Initialization phase

di,i = ai,i , i = 1, 2, · · · , n

li,j = ai,j , i = 2, · · · , n , j = 1, 2, · · · , i− 1

Incomplete factorization process

do j = 1, 2, · · · , n− 1

do i = j + 1, j + 2, · · · , n

di,i = di,i − l2i,j

dj,j

li,j =li,j

dj,j

do k = i + 1, i + 2, · · · , n

if (i, k) ∈ F lk,i = lk,i − li,j lk,j

end do

end do

end do

Figure 2.2.7: Incomplete factorization algorithm - M = LDLT .

useAτ = A + i τh∆r , (2.2.3)

where ∆r = diag(Re(A)) = diag(Re(A)), and τ stands for a nonnegativereal parameter, while

h = n−1d with d = 3 (the space dimension). (2.2.4)

The intention is to move the eigenvalues of the preconditioned system alongthe imaginary axis and thus avoid a possible eigenvalue cluster close to zero.

In Table 2.2.9, we show the number of SQMR iterations for differentvalues of τ , the shift parameter, and various levels of fill-in in thepreconditioner. The value of the shift is problem-dependent, and should beselected to ensure a good balance between making the factorization processmore stable without perturbing significantly the coefficient matrix. A goodvalue can be between 0 and 2. Although it is not easy to tune and its effect isdifficult to predict, a small diagonal shift can help to compute a more stablefactorization, and in some cases the performance of the preconditioner cansignificantly improve.

In Figures 2.2.8, 2.2.9 and 2.2.10, we illustrate the effect of this shiftstrategy on the eigenvalue distribution of the preconditioned matrix. For


Example 1Density of A = 2% - K∞(A) = 50321

IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 2.0% – – –IC(1) 4.5% – – –IC(2) 7.8% – – –

Density of A = 3% - K∞(A) = 120282IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 3.0% – – –IC(1) 7.5% – – –IC(2) 13.0% – – –

Density of A = 4% - K∞(A) = 29727IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 4.0% – – –IC(1) 11.9% – – –IC(2) 23.4% – – 194

Density of A = 5% - K∞(A) = 5350IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 5.0% – – 398IC(1) 16.9% – – 222IC(2) 32.3% 310 100 86

Density of A = 6% - K∞(A) = 12610IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 6.0% – – 296IC(1) 21.7% – – 128IC(2) 39.0% 81 46 45

Table 2.2.4: Number of iterations, varying the sparsity level of A and thelevel of fill-in on Example 1.

each value of the shift parameter τ , we display κ(L), the condition number(calculated using the LAPACK package) of the computed L factor, andthe number of iterations required by SQMR. The eigenvalues are scatteredall over the complex plane when no shift is used, whereas they look moreclustered when a shift is applied. As we mentioned before, a clusteredspectrum of the preconditioned matrix is usually considered a desirableproperty for fast convergence of Krylov solvers. However, for incompletefactorizations the condition number of the factors plays a more importantrole than the eigenvalue distribution on the rate of convergence of the Kryloviterations. In fact, if the triangular factors computed by the incompletefactorization process are very ill-conditioned, the long recurrences associated



IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 2.0% – – 168IC(1) 4.1% – – 386IC(2) 6.6% – – –

Density of A = 3% - K∞(A) = 998IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 3.0% – – 171IC(1) 6.7% 84 76 35IC(2) 11.5% 84 46 30

Density of A = 4% - K∞(A) = 737IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 4.0% – 327 121IC(1) 9.9% 46 38 31IC(2) 17.5% 32 31 25




with the triangular solves are unstable and the use of the preconditioner maybe totally uneffective.An auto-tuned strategy might be designed, which consists in incrementingthe value of the shift and computing a new incomplete factorization ifthe condition number of the current factor is too large. Although timeconsuming, this strategy might construct a robust shifted IC factorizationon highly indefinite problems.



IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 2.0% – – –IC(1) 4.5% – – –IC(2) 7.0% – – –

Density of A = 3% - K∞(A) = 13269IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 3.0% – – –IC(1) 7.1% – 247 110IC(2) 11.3% 60 41 40







IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 2.0% 285 221 98IC(1) 5.1% 46 42 30IC(2) 8.6% 30 30 24


Density of A = 4% - K∞(A) = 322IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 4.0% 255 187 96IC(1) 10.9% 24 24 15IC(2) 17.9% 19 19 12

Density of A = 5% - K∞(A) = 369IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 5.0% – – –IC(1) 14.7% 23 23 15IC(2) 24.5% 19 19 11

Density of A = 6% - K∞(A) = 370IC(level) Density of M GMRES(30) GMRES(50) SQMRIC(0) 6.0% 477 341 146IC(1) 18.6% 19 19 12IC(2) 30.2% 16 16 10




IC(level) Density of M κ∞(L) GMRES(30) GMRES(50) SQMRIC(0) 2.0% 2 · 103 378 245 102IC(1) 5.1% 1 · 103 79 68 45IC(2) 9.1% 9 · 102 58 48 34

Density of A = 3% - K∞(A) = 270IC(level) Density of M κ∞(L) GMRES(30) GMRES(50) SQMRIC(0) 3.0% 1 · 106 – – –IC(1) 7.8% 1 · 105 – – –IC(2) 12.8% 3 · 103 48 45 30

Density of A = 4% - K∞(A) = 253IC(level) Density of M κ∞(L) GMRES(30) GMRES(50) SQMRIC(0) 4.0% 6 · 109 – – –IC(1) 11.7% 2 · 105 – – –IC(2) 19.0% 7 · 103 40 38 25

Density of A = 5% - K∞(A) = 285IC(level) Density of M κ∞(L) GMRES(30) GMRES(50) SQMRIC(0) 5.0% 6 · 1010 – – –IC(1) 14.6% 1 · 105 – – 307IC(2) 23.0% 3 · 104 150 84 49

Density of A = 6% - K∞(A) = 294IC(level) Density of M κ∞(L) GMRES(30) GMRES(50) SQMRIC(0) 6.0% 8 · 1011 – – –IC(1) 18.8% 5 · 1011 – – –IC(2) 29.6% 7 · 104 – – 242



Example 1 - Density of A = 5%

IC(level) Density of Mτ

0.0 0.1 0.3 0.5 0.7 0.9 1.1IC(0) 5.0% 398 – 222 166 117 123 109IC(1) 16.9% 222 – – 169 90 73 67IC(2) 32.3% 86 159 146 134 67 68 62



0.0 0.1 0.3 0.5 0.7 0.9 1.1IC(0) 2.0% 168 423 – – 458 182 180IC(1) 4.1% 386 – – – 363 141 142IC(2) 6.6% – 380 200 – 474 142 117



0.0 0.1 0.3 0.5 0.7 0.9 1.1IC(0) 3.0% – – – – – 179 172IC(1) 7.1% 110 139 – 336 95 109 145IC(2) 11.3% 40 92 – 95 80 85 90



0.0 0.1 0.3 0.5 0.7 0.9 1.1IC(0) 3.0% 467 189 – – – – 206IC(1) 8.4% 24 26 60 234 – – –IC(2) 14.2% 14 15 21 28 – – –



0.0 0.1 0.3 0.5 0.7 0.9 1.1IC(0) 4.0% – – – – – – –IC(1) 11.7% – – – – – – –IC(2) 19.0% 25 131 123 – – – –

Table 2.2.9: Number of SQMR iterations, varying the shift parameter forvarious level of fill-in in IC.


(a) τ = 0.0 - κ(L) = 526284 -SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -SQMR iter. = 161

(c) τ = 0.7 - κ(L) = 777 - SQMRiter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMRiter. = 104

(c) τ = 1.1 - κ(L) = 261 - SQMRiter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMRiter. = 94

Figure 2.2.8: The spectrum of the matrix preconditioned with IC(1), thecondition number of L, and the number of iterations with SQMR for variousvalues of the shift parameter τ . The test problem is Example 1 and thedensity of A is around 3%.


(a) τ = 0.0 - κ(L) = 526284 -SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -SQMR iter. = 161

(c) τ = 0.7 - κ(L) = 777 - SQMRiter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMRiter. = 104

(c) τ = 1.1 - κ(L) = 261 - SQMRiter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMRiter. = 94

Figure 2.2.9: The eigenvalue distribution on the square [-1, 1] of the matrixpreconditioned with IC(1), the condition number of L, and the number ofiterations with SQMR for various values of the shift parameter τ . The testproblem is Example 1 and the density of A is around 3%.


(a) τ = 0.0 - κ(L) = 526284 -SQMR iter. = +500

(b) τ = 0.1 - κ(L) = 134975 -SQMR iter. = +500

(a) τ = 0.3 - κ(L) = 9608 -SQMR iter. = 313

(b) τ = 0.5 - κ(L) = 2165 -SQMR iter. = 161

(c) τ = 0.7 - κ(L) = 777 - SQMRiter. = 117

(d) τ = 0.9 - κ(L) = 434 - SQMRiter. = 104

(c) τ = 1.1 - κ(L) = 261 - SQMRiter. = 95

(d) τ = 1.3 - κ(L) = 183 - SQMRiter. = 94

Figure 2.2.10: The eigenvalue distribution on the square [-0.3, 0.3] of thematrix preconditioned with IC(1), the condition number of L, and thenumber of iterations with SQMR for various values of the shift parameterτ . The test problem is Example 1 and the density of A is around 3%.


2.2.3 AINV

An alternative way to construct a preconditioner is to compute an explicitapproximation of the inverse of the coefficient matrix. In this sectionwe consider two techniques, the first constructs an approximation of theinverse of the factors using an A-biconjugation process [19] and the other aFrobenius-norm minimization technique [93].

If the matrix A can be written in the form LDLT where L is unit lowertriangular and D is diagonal, then its inverse can be decomposed as A−1 =L−T D−1L−1 = ZD−1ZT where Z = L−T is unit triangular. Factorizedsparse approximate inverse techniques compute sparse approximations Z ≈Z, so that the resulting preconditioner will be M = ZD−1ZT ≈ A−1, forD ≈ D.

In the approach known as AINV the triangular factors are computedby means of a set of A-biconjugate vectors {zi}n

i=1, such that zTi Azj = 0 if

and only if i 6= j. Then, introducing the matrix Z = [z1, z2, ...zn] the relation

ZT AZ = D =

p1 0 . . . 00 p2 . . . 0...

.... . .

...0 0 . . . pn

holds, where pi = zTi Azi 6= 0 , and the inverse is equal to

A−1 = ZD−1ZT =n∑

i=1

zizTi

pi.

The sets of A-biconjugate vectors are computed by means of a (two-sided)Gram-Schmidt orthogonalization process with respect to the bilinear formassociated with A. A sketch of the algorithm is resumed in Figure 2.2.11.In exact arithmetic this process can be completed if and only if A admits aLU factorization. AINV does not require a pattern prescribed in advancefor the approximate inverse factors, and sparsity is preserved during theprocess, by discarding elements in the computed approximate inverse factorhaving magnitude smaller than a given positive threshold.

An alternative approach was proposed by Kolotilina and Yeremin ina series of papers ([95, 96, 97, 98]). This approach, known as FSAI,approximates A−1 by the factorization GT G, where G is a sparse lowertriangular matrix approximating the inverse of the lower triangular Choleskyfactor, L, of A. This technique has obtained good results on some difficultproblems and is suitable for parallel implementation, but it requires an apriori prescription for the sparsity pattern for the approximate factors. Theapproximate inverse factor is computed by minimizing ||I − GL||2F , thatcan be accomplished without knowing the Cholesky factor L by solving the


Compute D−1 and Z

Initialization phase

z(0)i = ei (1 ≤ i ≤ n), A = [a1, · · · , an]

The biconjugation algorithm

do i = 1, 2, · · · , n

do j = i, i + 1, · · · , n

p(i−1)j = aT

i z(i−1)j

end do

do j = i + 1, · · · , n

z(i)j = z

(i−1)j − (p

(i−1)j /p

(i−1)i )z

(i−1)i

end do

end do

zi = z(i−1)i , pi = p

(i−1)i

Figure 2.2.11: The biconjugation algorithm - M = ZD−1ZT .

normal equations{GLLT }ij = LT

ij , (i, j) ∈ SL (2.2.5)

where SL is a lower triangular nonzero pattern for G. Equation (2.2.5) canbe replaced by

{GA}ij = Iij , (i, j) ∈ SL (2.2.6)

where G = D−1G and D is the diagonal of L. Then, each row of Gcan be computed independently by solving a small linear system. Thepreconditioned linear system has the form

GAGT = DGAGT D.

The matrix D is not known and is generally chosen so that the diagonal ofGAGT is all ones.

Recently another matrix inversion based on incomplete biconjugationhas been proposed in [148]. The idea is to compute a lower unit triangularmatrix

L = [L1, L2, ...Ln] of order n,

such that LT AL is a diagonal nonsingular matrix, say

D−1 =diag[d−111 , d−1

22 ...d−1nn ] .


This is equivalent to the relations

LTi ALj

{= 0 if i 6= j6= 0 if i = j

· (2.2.7)

In other words LTi and Lj are A-biconjugate, and then the inverse can be

written as A−1 = LDLT . A procedure computes the inverse factors ofA−1 using relations 2.2.7 and preserves a sparsity pattern for the factor Ldiscarding entries with small modulus.

In Table 2.2.10 we show the number of iterations needed by GMRES andSQMR preconditioned by AINV to reduce the normwise backward error by10−5 on the five examples considered. On the most difficult problems, theperformance of this preconditioner is very poor. For low values of densityof A, AINV is less effective than a diagonal scaling, and its quality does notimprove even when the dense coefficient matrix is used for the constructionas shown in the results of Table 2.2.11. Both re-ordering and shift strategiesdo not improve the effectiveness of the preconditioner. We performedin particular experiments with the reverse Cuthil-MacKee ordering [37],the minimum degree ordering [71, 141] and the spectral nested dissectionordering [114]. The best performance were observed with the minimumdegree algorithm that in some cases enables to have smaller norm-wisebackward error at the end of convergence. We mention that very similaror sometimes more disappointing results have been observed with the FSAImethod and the other factorized approximate inverse proposed in [148].


Example 1

Density of AGMRES(m)

SQMRm=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – 313 –

8% – – 350 –

10% – – 207 306

Example 2


SQMRm=50 m=110 m=∞

2% – – – –

4% – – 206 345

6% 402 213 143 175

8% 318 195 120 132

10% 144 93 93 99

Example 3


SQMRm=50 m=110 m=∞

2% – – – –

4% 264 101 101 105

6% 56 51 51 48

8% 37 37 37 34

10% 31 31 31 29

Example 4


SQMRm=50 m=110 m=∞

2% – – 280 387

4% 83 68 68 57

6% 46 46 46 34

8% 42 42 42 32

10% 48 48 48 38

Example 5


SQMRm=50 m=110 m=∞

2% 177 142 121 111

4% – – 213 251

6% – 407 194 210

8% – 404 179 207

10% – 328 154 189

Table 2.2.10: Number of iterations required by different Krylov solverspreconditioned by AINV to reduce the residual by 10−5. The symbol ’-’means that convergence was not obtained after 500 iterations.


Example 1


SQMRm=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – – –

8% – – – –

10% – – 483 –

Example 2


SQMRm=50 m=110 m=∞

2% – – – –

4% – – 495 –

6% – – 361 –

8% – – 279 –

10% – – 209 486

Example 3


SQMRm=50 m=110 m=∞

2% – 288 153 176

4% 101 79 79 78

6% 66 57 57 52

8% 42 42 42 38

10% 36 36 36 34

Example 4


SQMRm=50 m=110 m=∞

2% – – 211 245

4% – 315 154 182

6% – 202 127 142

8% 447 107 107 114

10% 198 90 90 91

Example 5


SQMRm=50 m=110 m=∞

2% – – – –

4% – – – –

6% – – 259 474

8% – – 229 358

10% – – 216 374

Table 2.2.11: Number of iterations required by different Krylovsolvers preconditioned by AINV to reduce the residual by 10−5. Thepreconditioner is computed using the dense coefficient matrix. Thesymbol ’-’ means that convergence was not obtained after 500 iterations.


Possible causes of failure of factorized approximate inverses

One potential difficulty with the factorized approximate inverse methodAINV is the tuning of the threshold parameter that controls the fill-inin the inverse factors. For a typical example we display in Figure 2.2.12the sparsity pattern of A−1 (on the left) and L−1, the inverse of itsCholesky factor (on the right), respectively, where all the entries smallerthan 5.0 × 10−2 have been dropped after a symmetric scaling such thatmaxi |aji| = maxi |`ji| = 1. The location of the large entries in theinverse matrix exhibit some structure. In addition, only a very smallnumber of its entries have large magnitude compared to the others thatare much smaller. This fact has been successfully exploited to definevarious a priori pattern selection strategies for Frobenius norm minimizationpreconditioners [2, 22] in a non-factorized form. On the contrary, the inversefactors that are explicitely approximated by AINV and by FSAI can betotally unstructured as shown in Figure 2.2.12(b). In this case, the a prioriselection of a sparse pattern for the factors can be extremely hard as noreal structures are revealed, preventing the use of techniques like FSAI.In Figure 2.2.13 we plot the magnitude of the entries in the first columnof A−1 (on the left) and L−1 (on the right), respectively, with respect totheir row index. These plots indicate that any dropping strategy, eitherstatic or dynamic, may be very difficult to tune as it can easily discardrelevant information and potentially lead to a very poor preconditioner.Selecting too small a threshold would retain too many entries and lead to afairly dense preconditioner. For instance on the small example considered,if a threshold of 0.05 is used the preconditioner is 14.8% dense. A largerthreshold would yield a sparser preconditioner but might discard too manyentries of moderate magnitude that are important for the preconditioner. Onthe previous example all the entries with magnitude smaller than 0.2 mustbe dropped to keep the density in the inverse factor around 3%. Becauseof these issues, finding the appropriate threshold to enable a good trade-offbetween sparsity and numerical efficiency is challenging and very problem-dependent.

2.2.4 SPAI

Frobenius-norm minimization is a natural approach for building explicitpreconditioners. This method computes a sparse approximate inverse asthe matrix M = {mij} which minimizes ‖I − MA‖F (or ‖I − AM‖F

for right preconditioning) subject to certain sparsity constraints. Earlyreferences to this latter class can be found in [12, 13, 14, 65] and in [2]for some applications to boundary element matrices in electromagnetism.The Frobenius-norm is usually chosen since it allows the decoupling of theconstrained minimization problem into n independent linear least-squares


0 20 40 60 80 100 120

0

20

40

60

80

100

120

Density = 8.75%

(a) Sparsity pattern ofsparsified(A−1)

0 20 40 60 80 100 120

0

20

40

60

80

100

120

Density = 29.39%

(b) Sparsity pattern ofsparsified(L−1)

Figure 2.2.12: Sparsity patterns of the inverse of A (on the left) and ofthe inverse of its lower triangular factor (on the right), where all the entrieswhose relative magnitude is smaller than 5.0× 10−2 are dropped. The testproblem, representative of the general trend, is a small sphere.

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Column of A−1

Mag

nitu

de o

f the

ent

ries

in th

e 1s

t row

of A

−1

(a) Histogram of the magnitudeof the entries of the first column

of A−1

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(b) Histogram of the magnitudeof the entries in the first columnof the inverse of a factor of A

Figure 2.2.13: Histograms of the magnitude of the entries of the first columnof A−1 and its lower triangular factor. A similar behaviour has been observedfor all the other columns. The test problem, representative of the generaltrend, is a small sphere.

problems, one for each column of M (when preconditioning from the right)or row of M (when preconditioning from the left).The independence of these least-squares problems follows immediately fromthe identity:

‖I −MA‖2F = ‖I − AMT ‖2

F =n∑

j=1

‖ej − Amj•‖22 (2.2.8)

where ej is the j-th unit vector and mj• is the column vector representingthe j-th row of M .


In the case of right preconditioning, the analogous relation

‖I − AM‖2F =

n∑

j=1

‖ej − Am•j‖22 (2.2.9)

holds, where m•j is the column vector representing the j-th column ofM . Clearly, there is considerable scope for parallelism in this approach.However, the precondioner is not guaranteed to be nonsingular, and thesymmetry of A is generally not preserved in M . The main issue for thecomputation of the sparse approximate inverse is the selection of the nonzeropattern of M , that is the set of indices

S = { (i, j) ∈ [1, n]2 s.t. mij = 0 }.If the sparsity pattern of M is known, the nonzero structure for the j-thcolumn of M is automatically determined, and defined as

J = {i ∈ [1, n] s.t. (i, j) ∈ S}.The least-squares solution involves only the columns of A indexed by J ; weindicate this subset by A(:, J). Because A is sparse, many rows in A(:, J) areusually null, not affecting the solution of the least-squares problems (2.2.9).Thus if I is the set of indices corresponding to the nonzero rows in A(:, J),and if we define by A = A(I, J), by mj = mj(J), and by ej = ej(J), theactual “reduced” least-squares problems to solve are

min‖ej − Amj‖2, j = 1, .., n. (2.2.10)

Usually problems (2.2.10) have much smaller size than problems (2.2.9).Two different approaches can be followed for the selection of the sparsity

pattern of M : an adaptive technique that dynamically tries to identifythe best structure for M ; and a static technique, where the pattern ofM is prescribed a priori based on some heuristics. The idea is to keepM reasonably sparse while trying to capture the “large” entries of theinverse, which are expected to contribute the most to the quality of thepreconditioner. A static approach that requires an a priori nonzero patternfor the preconditioner, introduces significant scope for parallelism and hasthe advantage that the memory storage requirements and computationalcost for the setup phase are known in advance. However, it can be veryproblem dependent.

A dynamic approach is generally effective but is usually very expensive.These methods usually start with a simple initial guess, like a diagonalmatrix, and then improve the pattern until a criterion of the form ‖Amj −ej‖2 < ε (for each j) is satisfied for a given ε > 0, ej being the j-th columnof the identity matrix, or until a maximum number of nonzeros in the j-thcolumn mj of M has been reached.


Different strategies can be adopted to enrich the initial nonzero structureof the j-th column of the preconditioner. The method known as SPAI [84]uses some heuristic to select the new indices by predicting those that canmost effectively reduce the residual

‖r‖2 = ‖A(:, J)mj − ej‖2 (2.2.11)

Grote and Huckle [84] propose solving a one-dimensional minimizationproblem. If L = {l s.t. r(l) 6= 0}, then the new candidates are selected fromI = {j s.t. A(L, j) 6= 0}. They suggest solving, for each j ∈ I the followingproblem

minµj‖r + µjAej‖2.

The solution of this problem is

µj =rT Aej

‖Aej‖22

,

and the residual of the updated solution is given by

ρj = ‖r‖2 −rT Aej

‖Aej‖22

.

The proposed heuristic selects the indices which maximize rT Aej

‖Aej‖22. More

than one new candidate can be selected at a time, and the algorithm stopswhen either a maximum number of nonzeros per column is reached orthe required accuracy is achieved. The algorithm can deliver very goodpreconditioners even on hard problems, but at the cost of huge times andmemory although the execution time can be significantly reduced becauseof parallelism. A comparison in terms of construction cost with ILU-typemethods can be found in [18, 76].

In Table 2.2.12, we show the number of iterations needed by Krylovsolvers preconditioned by SPAI to solve the model problems. As for theother preconditioners, we consider different levels of density in the sparseapproximation of A. Provided the preconditioner is dense enough, SPAI isquite effective in reducing the number of iterations. Also, the quality of thepreconditioner on difficult problems can be remarkably improved if the densecoefficient matrix is used for the construction. For instance on Example 1, ifSPAI is computed using the full A, then a density of 2% for the approximateinverse enables the convergence of GMRES(80) in 75 iterations, whereasconvergence is not achieved in 500 iterations if the approximate inverse iscomputed using a sparse approximation of A. However the adaptive strategyrequires a prohibitive time. The construction of the approximate inverseusing 6% density for A takes nearly one hour of computation on a SGI


Origin 2000 for Example 4 and three hours for Example 5. When using thedense matrix A in the computation, the construction of the preconditionerfor the same examples takes more than one day.

2.2.5 SLU

In this section we use the sparsified matrix A as an implicit preconditioner;that is, the sparsified matrix is factorized using ME47, a sparse direct solverfrom HSL [87], and those exact factors are used as the preconditioner. Thusit represents an extreme case with respect to ILU(0), since a complete fill-in is allowed in the factors. This method will be referred to as SLU. Thisapproach, although not easily parallelizable, is generally quite effective onthis class of applications for dense enough sparse approximations of A.

In Table 2.2.13 we show the number of iterations required by differentKrylov solvers preconditioned by SLU to reduce the normwise backwarderror by a factor of 10−5. This approach, although not easily parallelizable,is generally quite effective on this class of applications for dense enoughsparse approximations of A. However, as shown in the table, whenthe preconditioner is very sparse, the numerical quality of this approachdeteriorates and the Frobenius-norm minimization method is more robust.


Example 1

Density of AGMRES(m) Bi -

CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – – 336 79 79 333 254 370

6% – – 150 65 65 269 243 312

8% – 242 82 56 56 175 195 240

10% – 237 50 50 50 127 174 196

Example 2


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – 212 – – –

4% – – 494 79 79 371 315 –

6% – – 185 72 72 291 279 432

8% – – 134 66 66 277 287 406

10% – – 109 62 62 229 267 458

Example 3


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – – 194 72 72 187 255 340

6% – 230 80 55 55 153 177 222

8% – 151 48 48 48 181 162 196

10% – 151 46 46 46 157 159 208

Example 4


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – 253 81 81 – 309 394

4% – – 187 113 85 374 331 424

6% – 401 153 76 76 288 270 370

8% – 90 47 47 47 76 171 170

10% 41 28 28 28 28 35 105 74

Example 5


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% – – – – – – – –

4% – 183 138 73 73 213 457 338

6% – – 194 122 93 – 448 442

8% – 289 137 71 71 – 345 358

10% – 283 100 68 68 – 334 266

Table 2.2.12: Number of iterations required by different Krylov solverspreconditioned by SPAI to reduce the residual by 10−5. The symbol ’-’means that convergence was not obtained after 500 iterations.


Example 1


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 +500 364 241 +500 +500 486

4% +500 +500 128 65 65 136 111 114

6% 60 31 31 31 31 23 36 28

8% 51 27 27 27 27 21 34 22

10% 33 22 22 22 22 14 25 17

Example 2


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 +500 288 109 489 290 229

4% 50 30 30 30 30 18 42 22

6% 40 27 27 27 27 16 38 21

8% 32 24 24 24 24 14 35 19

10% 26 21 21 21 21 13 30 16

Example 3


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% +500 +500 330 171 108 207 234 206

4% 38 27 27 27 27 16 29 19

6% 27 21 21 21 21 11 22 14

8% 21 17 17 17 17 10 17 12

10% 18 15 15 15 15 9 16 10

Example 4


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% 37 35 34 34 34 17 39 21

4% 23 21 21 21 21 10 24 14

6% 18 17 17 17 17 9 18 10

8% 15 15 15 15 15 8 16 9

10% 14 13 13 13 13 7 15 9

Example 5


CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

2% 72 45 42 42 34 46 37

4% 42 29 29 29 29 23 32 25

6% 29 26 26 26 26 20 28 16

8% 29 23 23 23 23 17 25 15

10% 28 21 21 21 21 17 25 18

Table 2.2.13: Number of iterations required by different Krylov solverspreconditioned by SLU to reduce the residual by 10−5. The symbol ’-’means that convergence was not obtained after 500 iterations.


2.2.6 Other preconditioners

A third class of explicit methods deserves to be mentioned here, althoughwe will not consider it in our numerical experiments. It is based on ILUtechniques, and in the general nonsymmetric case it builds the sparseapproximate inverse by first performing an incomplete LU factorizationA ≈ LU and then approximately inverting the L and U factors by solvingthe 2n triangular linear systems

{Lxi = ei

Uyi = ei(1 ≤ i ≤ n).

These two systems are solved approximately, prescribing two sparsitypattern for L and U and using a Frobenius-type method, or the adaptiveSPAI method without any pattern in advance. Another approach, whichhas provided better results, consists in solving the 2n triangular systems bycustomary forward and backward substitution, respectively, and adoptingdropping strategy, based either on position or on values, to maintain sparsityin the columns of L and U . Generally two different levels of incompletenessare applied, rather than one as in the other approximate inverse methods.These preconditioners are not easy to use; relying on ILU factorization, theyare almost useless for highly nonsymmetric, indefinite matrices and sinceincomplete processes are strongly sequential, the preconditioner buildingphase is not entirely parallelizable, although the independence of the twotriangular solves suggest a good scope for parallelism. References to thisclass can be found in [3, 40, 133].

2.3 Concluding remarks

In this chapter we have established the need for preconditioning linearsystems of equations which arise from the discretization of boundaryintegral equations in electromagnetism. We have discussed several standardpreconditioners based on sparsification strategies and have studied andcompared their numerical behaviour on a set of model problems that may berepresentative of real electromagnetic calculation. We have shown that theincomplete factorization process is highly unstable on indefinite matriceslike those arising from the discretization of the EFIE formulation. Usingnumerical experiments we have shown that the triangular factors computedby the factorization can be very ill-conditioned, and the long recurrencesassociated with the triangular solves are unstable. As an attempt at apossible remedy, we have introduced a small complex shift to move theeigenvalues of the preconditioned system along the imaginary axis andthus try to avoid a possible cluster of eigenvalues close to zero. A smalldiagonal complex shift can help to compute a more stable factorization.

2.4. Concluding remarks 51

However, suitable strategies can be introduced to tune the optimal valueof the shift and to predict its effect. Factorized approximate inverses,namely AINV and FSAI, exhibit poor convergence behaviour becausethe inverse factors can be totally unstructured; both reordering and shiftstrategies do not improve their effectiveness. Any dropping strategy, eitherstatic or dynamic, may be very difficult to tune as it can easily discardrelevant information and potentially lead to a very poor preconditioner.Among different techniques, Frobenius norm minimization methods arequite efficient because they deliver a good rate of convergence. However,they require a high computational effort, so that their use is mainly effectivein a parallel setting. To be computationally affordable on dense linearsystems, Frobenius-norm minimization preconditioning techniques requirea suitable strategy to identify the relevant entries to consider in the originalmatrix A, in order to define small least-squares problems, as well as anappropriate sparsity structure for the approximate inverse. Prescribing apattern in advance for the preconditioner can greatly reduce the amountof work in terms of CPU-time. The problem of cost is evident for thecomputation of SPAI, since fast convergence can be obtained for high valuesof the sparsity ratio, but then the adaptive strategy requires a prohibitivetime and computational cost in a sequential environment. Compared tosparse approximate inverse methods, SSOR is generally slower, but is verycheap to compute. Its main drawback is that it is not parallelizable and inaddition, for much larger problems, the cost per iteration will grow so thatthis preconditioner will no longer be competitive with the other techniques.Finally, the SLU preconditioner, although generally quite effective on thisclass of applications, is not easily parallelizable and requires dense enoughsparse approximations of A. This preconditioner can be expensive in termsof both memory and CPU time for the solution of large problems, and thusit is mainly interesting for comparison purpose.

Chapter 3

Sparse pattern selectionstrategies for robustFrobenius-normminimization preconditioner

In the previous chapter, we established the need for preconditioninglinear systems of equations arising from the discretization ofboundary integral equations (expressed via the EFIE formulation) inelectromagnetism. We briefly discussed some preconditioners and comparedtheir performance on a set of model problems arising both from academicand from industrial applications. The numerical results suggests that sparseapproximate inverse techniques can be good candidates to preconditionthis class of problems efficiently. In particular, the Frobenius-normminimization approach can greatly reduce the number of iterations neededif compared with the implicit approach based on incomplete factorization.In addition Frobenius-norm minimization is inherently parallel. To becomputationally affordable on dense linear systems, Frobenius-normminimization preconditioners require a suitable strategy to identify therelevant entries to consider in the original matrix A, in order to define smallleast-squares problems, as well as an appropriate sparsity structure for theapproximate inverse.

In this chapter, we propose some efficient static nonzero patternselection strategies both for the preconditioner and for the selection ofthe entries of A. In Section 3.1, we overview both dynamic and staticapproaches to compute the sparsity pattern of Frobenius-norm minimizationpreconditioners. In Section 3.2, we introduce and compare some strategiesto prescribe in advance the nonzero structure of the preconditioner inelectromagnetic applications. In Section 3.3, we propose the use of a different

53

54 3. Sparse pattern selection strategies for robust ...

pattern selection procedure for the original matrix from that used for thepreconditioner and finally, in Section 3.4 we illustrate the numerical andcomputational efficiency of the proposed preconditioners on a set of modelproblems.


We introduced Frobenius-norm minimization in Section 2.2.4. The ideais to compute the sparse approximate inverse of a matrix A as the matrixM which minimizes ‖I −MA‖F (or ‖I − AM‖F for right preconditioning)subject to certain sparsity constraints. The main issue is the selection ofthe nonzero pattern of M . The idea is to keep M reasonably sparse whiletrying to capture the “large” entries of the inverse, which are expected tocontribute the most to the quality of the preconditioner. For this purpose,two approaches can be followed: an adaptive technique that dynamicallytries to identify the best structure for M ; and a static technique, where thepattern of M is prescribed a priori based on some heuristics.

A simple approach is to prescribe the locations of nonzeros of M beforecomputing their actual values. When the coefficient matrix has a specialstructure or special properties, efforts have been made to find a patternthat can retain the entries of A−1 having large modulus [42, 48, 49, 138],and indeed some theoretical studies have shown that there are cases wherethe large entries in A−1 are clustered near the diagonal [58, 106]. If A isrow diagonally dominant, then the entries in the inverse decay columnwiseand vice versa [138]. When A is a banded SPD matrix, the entries of A−1

decay exponentially along each row or column; more precisely, if bij is theelement located at the i-th row and j-th column of A−1, then

|bij | ≤ Cγ|i−j| (3.1.1)

where γ < 1 and C > 0 are constant. In this case a banded M would be agood approximation to A−1 [49]. For many PDE problems the entries of theinverse exhibit some decaying behaviour and a good sparse pattern for theapproximate inverse can be computed in advance. However the constant Cin relation (3.1.1) can be very large and the decay unacceptably slow, or thedecay is non-monotonic and thus hardly predictable [139].

For sparse matrices, the nonzero structure of the approximate inversecan be computed based on graph information of the coefficient matrix. Thesparsity structure of a sparse matrix A of order n is represented by a directedgraph G(A) where the vertices are the integers {1, 2, ..., n} and the edgesconnect pairs of distinct vertices (i, j) corresponding to nonzero off-diagonalentries {aij} in A. The inverse will contain a nonzero in the (i, j) locationwhenever there is a directed path connecting vertex i to vertex j in G(A) [72].


Several heuristics can be used to traverse the graph along specific directionsand select a suitable subset of vertices of G(A) to construct the sparsitypattern of the approximate inverse. Benson and Frederickson [13] definethe structure for the j-th column of the approximate inverse in the case ofstructurally symmetric matrices with a full diagonal by selecting in G(A)vertex j and its q-th level nearest-neighbours. They called matrices definedwith these patterns as q-local matrices. A 0-local matrix has a diagonalstructure, while a 1-local matrix has the same sparsity pattern of A. Takingfor the sparse approximate inverse the same pattern of A generally workswell only for specific classes of problems; using more levels can improve thequality of the preconditioner but the storage can become prohibitive whenq is increased, and even q=2 is impractical in many cases [61].

The direction of the path in the graph can be selected based onphysical considerations dictated by the decay of the magnitude of the entriesobserved in the discrete Green’s function for many problems [139]. Thediscrete Green’s function can be considered as a row or as a column ofthe exact inverse depicted on the physical computational grid. Droppingor sparsification can help to identify the most relevant interactions inthe direct problem and select suitable search directions in the graph.For instance dropping entries of A smaller than a global threshold candetect anisotropy in the underlying problem and reveal it when noadditional physical information is available. Chow [33] proposes combiningsparsification with the use of patterns of powers of the sparsified matrixfor preconditioning linear systems arising from the discretization of PDEproblems. Sparsification can remarkably reduce the construction cost of thepreconditioner, and the use of matrix powers enables to retain the largestentries in the Green’s function. A post-processing stage, called filtration,can be included to drop small magnitude entries in the sparse approximateinverse, and reduce the cost of storing and applying the preconditioner.However, the choice of these parameters is problem-dependent and thisstrategy is not guaranteed to be effective on systems not arising from PDEs.

The difficulty in extracting a good sparsity pattern for the approximateinverse of matrices with a general sparsity pattern has motivated theinvestigation of adaptive strategies that compute the pattern of theapproximate inverse dynamically. The adaptive procedure known as SPAIhas been already described in Section 2.2.4. The procedure describedin [35] uses a few steps of an iterative solver, like the minimal residual,to approximately minimize the least-squares problems of relation 2.2.9.The sparsity pattern automatically emerges during the computation, anda dual threshold strategy is adopted to drop small entries either in thesearch directions or the iterates. To control costs, operations must beperformed in sparse-sparse mode, meaning that sparse matrix-sparse vectormultiplications are performed. These algorithms usually compute theapproximate inverse starting with an initial pattern and estimate the


accuracy of the preconditioner computed by monitoring the 2-norm of theresidual R = I −AM . If the norm is larger than a user-defined threshold orthe number of nonzeros used is less than a fixed maximum, the patternis enlarged according to some heuristics and the approximate inverse isrecomputed. The process is repeated until the required accuracy is notattained. We refer to these as adaptive procedures.

We have mentioned the problem of cost for the computation of SPAI.Fast convergence can be obtained for high values of the sparsity ratio, butthen the adaptive strategy requires a prohibitive time and computationalcost in a sequential environment. In general, adaptive strategies can solvemuch more general or hard problems but tend to be very expensive. The useof effective static pattern selection strategies can greatly reduce the amountof work in terms of CPU-time, and improve substantially the overall setupprocess, introducing significant scope for parallelism. Also, the memorystorage requirements and computational cost for the setup phase are knownin advance.

In the next sections, we investigate nonzero pattern selection strategiesfor the computation of sparse approximate inverses on electromagneticproblems. We consider both methods based on the magnitude of the entriesand methods which exploit geometric or topological information from theunderlying meshes. The pattern is computed in a preprocessing step andthen used to compute the entries of the preconditioner.

3.2 Pattern selection strategies for Frobenius-norm minimization methods inelectromagnetism

3.2.1 Algebraic strategy

The boundary element method discretizes integral equations on the surfaceof the scattering object, generally introducing a very localized strongcoupling among the edges in the underlying mesh. Each edge is stronglyconnected to only a few neighbours while, although not null, far-awayconnections are much weaker. This means that a very sparse matrix canstill retain the most relevant contributions from the singular integrals thatgive rise to dense matrices.

Owing to the decay of the discrete Green’s function, the inverse of Amay exhibit a very similar structure to A. Figure 3.2.1 shows the typicaldecay of the discrete Green’s function for Example 5, a scattering problemfrom a small sphere, which is representative of the general trend. In thedensity coloured plot, large to small magnitude entries in the inverse matrix

3.2. Pattern selection strategies for Frobenius-norm ... 57

are depicted in different colours, from red to green, yellow and blue. Thediscrete Green’s function peaks at a point, then it decays rapidly, and farfrom the diagonal only a small set of entries have large magnitude.

Figure 3.2.1: Pattern structure of A−1. The test problem is Example 5.

In this case, a good pattern for the sparse approximate inverse is likelyto be the nonzero pattern of a sparse approximation to A, constructedby dropping all the entries lower than a prescribed global threshold, assuggested for instance in [93]. We refer to this approach as the algebraicapproach.

The dropping heuristics described in Section 2.2 can be used to computethe sparse pattern for the approximate inverse. In [2], these approaches werecompared, observing similar results in the ability to cluster the eigenvaluesof the preconditioners. The first and the last heuristic are the simplest,and are more suitable for parallel implementation. In addition, the firstone has the advantage of placing the number of nonzero entries in theapproximate inverse under complete user-control, and of achieving a perfectload balancing in a parallel implementation. A drawback common to allheuristics is that we need some deus ex machina to find optimal values forthe parameters. In the numerical experiments, we have selected the strategywhere, for each column of A, the k entries (k ¿ n is a positive integer) oflargest modulus are retained.

The algebraic strategy generally works well and competes with theapproach that adaptively defines the nonzero pattern as implementedin the SPAI preconditioner described in reference [84]. Nevertheless it


suffers some drawbacks that put severe limits on its use in practicalapplications. For large problems, accessing all the entries of the matrixA becomes too expensive or even impossible. This is the case in the fastmultipole framework, where all the entries of the matrix A are not evenavailable. In addition on complex geometries, a pattern for the sparseapproximate inverse computed by using information solely from A maylead to a poor preconditioner. These two main drawbacks motivate theinvestigation of more appropriate techniques to define a sparsity patternfor the preconditioner.

Because we work in an integral equation context, we can use moreinformation than just the entries of the matrix of the discretized problem. Inparticular, we can exploit the underlying mesh and extract further relevantinformation to construct the preconditioner. Two types of information areavailable from the mesh:

the connectivity graph, describing the topological neighbourhood amongthe edges, and

the coordinates of the nodes in the mesh, describing geometricneighbourhoods among the edges.

3.2.2 Topological strategy

In the integral equation context that we consider, the surface of the object isdiscretized by a triangular mesh (see Figure 3.2.2). Each degree of freedom(DOF), representing an unknown in the linear system, corresponds to thevectorial flux across an edge in the mesh.

When the object geometries are smooth, only the neighbouring edges canhave a strong interaction with each other, while far-away connections aregenerally much weaker. Thus an effective pattern for the sparse approximateinverse can be prescribed by exploiting topological information related to thenear field. The sparsity pattern for any row of the preconditioner can bedefined according to the concept of level k neighbours, as introduced in [115].Figure 3.2.3 shows the hierarchical representation of the mesh in terms oftopological levels. Level 1 neighbours of a DOF are the DOF plus the fourDOFs belonging to the two triangles that share the edge corresponding tothe DOF itself. Level 2 neighbours are all the level 1 neighbours plus theDOFs in the triangles that are neighbours of the two triangles considered atlevel 1, and so forth.

In Figures 3.2.4 and 3.2.5 we plot, for each pair of DOFs of the meshfor Example 1, the magnitude of the associated entry in A and A−1 withrespect to their relative level of neighbours. The large entries in A−1 derivefrom the interaction of a very localized set of edges in the mesh so that byretaining a few levels of neighbours for each DOF an effective preconditioner


Figure 3.2.2: Example of discretized mesh.

Figure 3.2.3: Topological neighbours of a DOF in the mesh.

is likely to be constructed. Three levels can generally provide a good patternfor constructing an effective sparse approximate inverse. Using more levelsincreases the computational cost but does not improve substantially thequality of the preconditioner. We will refer to this pattern selection strategyas the topological strategy. In Figure 3.2.6 we show how the density ofnonzeros in the preconditioner evolves when the number of levels is increased.


It can be seen that for up to five levels the preconditioner is still sparse witha density lower than 10%. Considering too many topological levels maycause unnecessary introduction of nonzeros in the sparse approximation.Some of these nonzero entries do not contribute much to the quality of theapproximation.

Magnitude v.s. levels for A

Figure 3.2.4: Topological localization in the mesh for the large entries ofA. The test problem is Example 1 and is representative of the generalbehaviour.

3.2.3 Geometric strategy

When the object geometries are not smooth, two far-away edges in thetopological sense can have a strong interaction with each other so that theyare strongly coupled in the inverse matrix. For the scattering problem onExample 1, we plot in Figures 3.2.7 and 3.2.8, for the interaction of eachpair of edges in the mesh, the magnitude of the associated entry in A andA−1 with respect to their distance in terms of wavelength. The largestentries of A−1 on smooth geometries may come from the interaction of ageometrically localized set of entries in the mesh. If we construct the sparsepattern for the inverse by only using information related to A, we mayretain many small entries in the preconditioner, contributing marginally toits quality, but may neglect some of the large ones potentially damaging thequality of the preconditioner. Also, when the surface of the object is verynon-smooth, these large entries may come from the interaction of far-awayor non-connected edges in a topological sense, which are neighbours in ageometric sense. Thus they cannot be detected by using only topologicalinformation related to the near field. Figure 3.2.8 suggests that we can


Magnitude v.s. levels for A−1

Figure 3.2.5: Topological localization in the mesh for the large entries ofA−1. The test problem is Example 1 and is representative of the generalbehaviour.

select the pattern for the preconditioner using physical information, thatis: for each edge we select all those edges within a sufficiently large spherethat defines our geometric neighbourhood. By using a suitable size for thissphere, we hope to include the most relevant contributions to the inverseand consequently to obtain an effective sparse approximate inverse. Thisselection strategy will be referred to as the geometric strategy. In Figure 3.2.9we show how the density of nonzeros in the preconditioner evolves when theradius of the sphere increases.

3.2.4 Numerical experiments

In this section, we compare the different strategies described above in thesolution of our test problems.

Using the three pattern selection strategies for M , we denote by

• Ma, the preconditioner computed by using the algebraic strategy,

• Mt, the preconditioner computed by using the topological strategy,

• Mg, the preconditioner computed by using the geometric strategy,

• SPAI, the preconditioner constructed by using the dynamic strategyimplemented by [77] and described in Section 2.2.4.

To evaluate the effectiveness of the proposed strategies, we first considerusing the dense matrix A to construct the preconditioners Ma, Mt, Mg andSPAI. This requires the solution of large dense least-squares problems.


0 5 10 15 20 25 30 35 400

10

20

30

40

50

60

70

80

90

100

Levels

Per

cent

age

of d

ensi

ty o

f the

pat

tern

com

pute

d

Figure 3.2.6: Evolution of the density of the pattern computed for increasingnumber of levels. The test problem is Example 1. This is representative ofthe general behaviour.

The density of the preconditioner varies from one problem to anotherfor the same value of the distance parameter chosen to define Mg. AsFigure 3.2.8 shows, and tests on all the other examples confirm, those entries,corresponding to edges contained within a sphere of radius 0.12 times thewavelength, can retain many of the large entries of the inverse while givingrise to quite a sparse preconditioner. For all our numerical experiments, wechoose a value for k in the construction of Ma and SPAI, and for the levelof neighbours used to generate Mt so that they have the same density asMg, when necessary discarding some small entries of the preconditioner sothat all have the same number of entries.

As for the numerical experiments reported in the previous chapter, weshow results for different Krylov solvers. The stopping criteria in all casesjust consists in reducing the normwise backward error by 10−5. The symbol’-’ means that convergence was not obtained after 500 iterations. In eachcase, we took as the initial guess x0 = 0, and the right-hand side wassuch that the exact solution of the system was known. We performeddifferent tests with different known solutions, observing identical results.All the numerical experiments were performed in double precision complexarithmetic on a SGI Origin 2000 and the number of iterations reported inthis paper are for left preconditioning. Very similar results were obtainedwhen preconditioning from the right.

From the results shown in Table 3.2.1, we first note that all thepreconditioners accelerate the convergence of the Krylov solvers, and insome cases enable convergence when the unpreconditioned solver diverges


Magnitude v.s. distance for A

Figure 3.2.7: Geometric localization in the mesh for the large entries ofA. The test problem is Example 1. This is representative of the generalbehaviour.

or converges very slowly. These numerical experiments also highlight theadvantages of the geometric strategy. It not only outperforms the algebraicapproach and is more robust than the topological approach, which has asimilar computational complexity, but it also generally outperforms theadaptive approach implemented in SPAI which is much more sophisticatedand more expensive in execution time and memory. SPAI competes with Mg

only on Example 1 where the density of the preconditioner is higher. Thistrend, namely the denser the preconditioner the more efficient SPAI is, hasbeen observed on many other examples. However, for sparse preconditioners,SPAI may be quite poor, as illustrated on Example 4 where preconditionedGMRES(30) or Bi-CGSTAB are slower than without a preconditioner andthe iteration diverges for GMRES(10) with the SPAI preconditioner while itconverges for the other three preconditioners. On the non-smooth geometry,that is Example 2, an explanation of why the geometric approach should leadto a better sparse preconditioner can be suggested by Figure 3.2.10. Somefar-away edges in the connectivity graph, those from each side of the break,are weakly connected in the mesh but can have a strong interaction witheach other and can lead to large entries in the inverse matrix.


(b) Magnitude v.s. distance for A−1

Figure 3.2.8: Geometric localization in the mesh for the large entries ofA−1. The test problem is Example 1. This is representative of the generalbehaviour.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

10

20

30

40

50

60

70

80

90

100

Distance/Wavelength

Per

cent

age

of d

ensi

ty o

f the

pat

tern

com

pute

d

Figure 3.2.9: Evolution of the density of the pattern computed for largergeometric neighbourhoods. The test problem is Example 1. This isrepresentative of the general behaviour.

3.3. Strategies for the coefficient matrix 65

Example 1 - Density of M = 5.03%

Precond.GMRES(m) Bi -

CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 251 202 223 231 175

Mj - - 465 222 174 239 210 169

Ma 219 135 96 72 72 86 107 72

Mt 100 49 36 36 36 35 42 32

Mg 124 68 46 46 46 44 58 38

SPAI - 67 44 44 44 48 50 43



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 398 289 359 403 249

Mj - - 473 330 243 257 354 228

Ma 472 273 239 207 184 330 313 141

Mt - 470 346 243 195 187 275 158

Mg 90 72 55 52 52 44 82 40

SPAI - - 99 61 61 168 97 111



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 224 191 158 147 177 170 118

Mj 350 211 178 153 140 188 152 110

Ma 212 157 141 132 123 131 145 115

Mt 288 187 160 146 139 145 156 98

Mg 63 51 41 41 41 37 47 32

SPAI - 370 184 112 84 256 96 85

Table 3.2.1: Number of iterations using the preconditioners based ondense A.

3.3 Strategies for the coefficient matrix

When the coefficient matrix of the linear system is dense, the constructionof even a very sparse preconditioner may become too expensive in executiontime as the problem size increases. Both memory and execution time aresignificantly reduced by replacing A with a sparse approximation. Ongeneral problems, this approach can cause a severe deterioration of thequality of the preconditioner; in the context of the Boundary ElementMethod (BEM), since a very sparse matrix can retain the most relevantcontributions to the singular integrals, it is likely to be more effective. The


Figure 3.2.10: Mesh of Example 2.

use of a sparse matrix substantially reduces the size of the least-squaresproblems that can then be efficiently solved by direct methods.

The algebraic heuristic described in the previous sections is well suitedfor sparsifying A. In [2] the same nonzero sparsity pattern is selected bothfor A and M ; in that case, especially when the pattern is very sparse, thecomputed preconditioner may be poor on some geometries. The effect ofreplacing A with its sparse approximation on some problems is highlightedin Figure 3.3.12 where we display the sparsified pattern of the inverse of thesparsified A. We see that the resulting pattern is very different from thesparsified pattern of the inverse of A shown in Figure 3.3.11.

A possible remedy is to increase the density in the patterns for bothA and M . To a certain extent, we can improve the convergence, but thecomputational cost of generating the preconditioner grows almost cubiclywith respect to density. A cheaper remedy is to choose a different numberof nonzeros to construct the patterns for A and M , with less entries in thepreconditioner than in A, the sparse approximation of A. To illustrate thiseffect, we show in Table 3.3.2 the number of iterations of preconditionedGMRES(50), where the preconditioners are built by using either the samesparsity pattern for A or a two, three or five times denser pattern for A.

Except when the preconditioner is very sparse, increasing the densityof the pattern imposed on A for a given density of M accelerates theconvergence as expected, getting quite rapidly very close to the numberof iterations required when using a full A. The additional cost in termsof CPU time is negligible as can be seen in Figure 3.3.13 for experimentson Example 1. This is due to the fact that the complexity of the QRfactorization used to solve the least-squares problems is the square of thenumber of columns times the number of rows. Thus, increasing the numberof rows, that is the number of entries of A, is much cheaper in terms ofoverall CPU time than increasing the density of the preconditioner, thatis the number of columns in the least-squares problems. Notice that this

3.3. Strategies for the coefficient matrix 67

sparsified(A−1)

Figure 3.3.11: Nonzero pattern for A−1 when the smallest entries arediscarded. The test problem is Example 5.

Example 1Percentage density of M

Density strategy1 2 3 4 5 6 7 8 9 10

Same - - 299 146 68 47 47 42 37 392 times - - 248 155 76 46 40 39 39 383 times - 253 207 109 49 39 39 37 35 345 times - 258 213 99 48 37 38 34 33 33Full A 364 359 144 96 46 35 35 34 32 31

Table 3.3.2: Number of iterations for GMRES(50) preconditioned withdifferent values for the density of M using the same pattern for A and largerpatterns. A geometric approach is adopted to construct the patterns. Thetest problem is Example 1. This is representative of the general behaviourobserved.

observation is true for both left and right preconditioning because, accordingto (2.2.8) and (2.2.9), the smaller dimension of the matrices involved inthe least-squares problems always corresponds to the entries of M to becomputed, and the larger to the entries of the sparsified matrix from A.


Figure 3.3.12: Sparsity pattern of the inverse of sparse A associated withExample 1. The pattern has been sparsified with the same value of thethreshold used for the sparsification of displayed in Figure 3.3.11.

3.4 Numerical results

We report in this section on the numerical results obtained by replacingA with its sparse approximation in the construction of the preconditioner.In Table 3.4.3 we use the following notation:

• Ma−a, introduced in [2] and computed by using algebraic informationfrom A. The same pattern is used for the preconditioner;

• Ma−t, constructed by using the algebraic strategy to sparsify A andthe topological strategy to prescribe the pattern for the preconditioner;

• Ma−g, constructed by using the geometric approach and an algebraicheuristic for A with the same density as for the preconditioner;

• M2a−t, similar to Ma−t, but the density of the pattern imposed on Ais twice as dense as that imposed on Ma−t;

• M2a−g, similar to Ma−g but, as in the previous case, the density of thepattern imposed on A is twice as dense as that imposed on Ma−g.

3.4. Numerical results 69

0 1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

350

400

450

500

Density of the preconditioning matrix

CP

U−

time

for

the

cons

truc

tion

of th

e pr

econ

ditio

ner

1:1 3:1 5:1 Full A

Figure 3.3.13: CPU time for the construction of the preconditioner using adifferent number of nonzeros in the patterns for A and M . The test problemis Example 1. This is representative of the other examples.

For the sake of comparison we also report the number of iterationswithout using a preconditioner and with only a diagonal scaling, denoted byMj (j stands for Jacobi preconditoner).

Other combinations are possible for defining the selection strategies forthe patterns of A and M . Here we focus on the most promising ones that useinformation from the mesh to retain the large entries of the inverse, and thealgebraic strategy for A to capture the most relevant contributions to thesingular integrals. We also consider the preconditioner Ma−a to comparewith previous tests [2] that were performed on different geometries fromthose considered here. We show, in Table 3.4.3, the results of our numericalexperiments. For each example, we give the number of iterations requiredby each preconditioned solver.




CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 251 202 223 231 175

Mj - - 465 222 174 239 210 169

Ma−a 284 170 138 114 92 120 156 94

Ma−t 179 61 45 45 45 43 58 36

Ma−g 147 93 68 59 59 55 73 53

M2a−t 128 56 40 40 40 37 50 36

M2a−g 131 79 52 51 51 59 65 44



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - 398 289 359 403 249

Mj - - 473 330 243 257 354 228

Ma−a - 319 255 221 203 181 319 135

Ma−t - 261 213 174 169 128 251 121

Ma−g 251 178 150 138 117 106 256 116

M2a−t - 370 284 202 182 176 276 127

M2a−g 100 73 61 55 55 48 93 40



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - - - - 488 - 444 308

Mj - - - 491 427 375 356 306

Ma−a 436 316 240 193 125 144 166 135

Ma−t 137 108 93 71 71 64 93 66

Ma−g - 464 296 203 108 240 166 144

M2a−t 113 78 59 53 53 41 61 44

M2a−g 122 84 72 59 59 53 67 50



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 224 191 158 147 177 170 118

Mj 350 211 178 153 140 188 152 110

Ma−a 299 205 172 146 133 162 180 103

Ma−t 266 152 130 114 99 92 127 83

Ma−g 81 67 66 63 63 39 79 41

M2a−t 269 167 143 136 116 107 137 93

M2a−g 71 60 47 47 47 43 61 41

Continued on next page

3.4. Numerical results 71

Continued from previous page



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Unprec. - 344 233 146 125 152 170 109

Mj - 326 219 140 131 183 173 107

Ma−a - 352 249 154 134 202 183 107

Ma−t 360 66 64 60 60 34 76 46

Ma−g 313 81 68 61 61 36 74 40

M2a−t 71 48 47 47 47 25 54 30

M2a−g 88 42 39 39 39 21 45 25

Table 3.4.3: Number of iterations to solve the set of test problems.

Example 1 - Density of M = 5.03%Ma−a Ma−t M2a−t Ma−g M2a−g

83.42 91.07 91.78 79.47 80.18Example 2 - Density of M = 1.59%

Ma−a Ma−t M2a−t Ma−g M2a−g







27.66 70.93 71.29 26.04 26.13

Table 3.4.4: CPU time to compute the preconditioners.

In Table 3.4.4, we show the CPU time required to compute thepreconditioners when the least-squares problems are solved using LAPACKroutines. The CPU time for constructing Ma−t and M2a−t is in some casesmuch larger than that needed for Ma−g and M2a−g. The reason is that, inthe topological strategy, it is not possible to prescribe exactly a value forthe density. Thus, for each problem, we select a suitable number of levelsof neighbours, to obtain the closest number of nonzeros to that retained inthe pattern based on the geometric approach. After the construction of the


preconditioner, we drop its smallest entries to ensure an identical number ofnonzeros for the two strategies. The results illustrate that considering twiceas dense a pattern for A as for M does not cause a significant growth inthe computational time although it enables us to construct a more robustpreconditioner.

We first observe that using a sparse approximation of A reduces theconvergence rate of the preconditioned iterations when the nonzero patternimposed on the preconditioner is very sparse. However if we adopt thegeometric strategy to define the sparsity pattern for the approximate inverse,the convergence rate is not affected very much. For even larger valuesof density, the difference in the number of iterations between using fullA or an algebraic sparse approximation becomes negligible. For all theexperiments, Ma−g still outperforms Ma−a and is generally more robust thanMa−t; the most efficient and robust preconditioner is M2a−g. The multipledensity strategy allows us to improve the efficiency and the robustness of theFrobenius-norm preconditioner on this class of problems without requiringany more time for the construction of the preconditioner. For all the testexamples, it enables us to get the fastest convergence even for GMRES witha low restart parameter on problems where neither Ma−a nor Ma−g converge.

The effectiveness of this multiple density heuristic is illustrated inFigures 3.4.14 and 3.4.15 where we see the effect of preconditioning on theclustering of the eigenvalues of A for the most difficult problem, Example 2.The eigenvalues of the preconditioned matrices are in both cases wellclustered around the point (1.0,0.0) (with a more effective clustering forM2a−g), but those obtained by using the multiple density strategy arefurther from the origin. This is highly desirable when trying to improvethe convergence of Krylov solvers.

Another advantage of this multiple density heuristic is that it generallyallows us to reduce the density of the preconditioner (and thus itsconstruction cost), while preserving its numerical quality. Although nospecific results are reported to illustrate this aspect, this behaviour maybe partially observed in Table 3.3.2.


−0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

Real axis

Imag

inar

y ax

is

Figure 3.4.14: Eigenvalue distribution for the coefficient matrixpreconditioned by using a single density strategy on Example 2.


We have presented some a priori pattern selection strategies for theconstruction of a robust sparse Frobenius-norm minimization preconditionerfor electromagnetic scattering problems expressed in integral formulation.We have shown that, by using additional geometric information from theunderlying mesh, it is possible to construct robust sparse preconditionersat an affordable computational and memory cost. The topological strategyrequires less computational effort to construct the pattern, but since thedensity is a step function of the number of levels, the construction of thepreconditioner can require some additional computation. Also it may nothandle very well complex geometries where some parts of the object are notconnected. By retaining two different densities in the patterns of A and Mwe can decrease very much the computational cost for the construction of thepreconditioner, usually a bottleneck for this family of methods; preservingthe efficiency while increasing the robustness of the resulting preconditioner.Although sparsifying A using an algebraic dropping strategy seems to bethe most natural approach to get a sparse approximation of A when allits entries are available, either the topological or the geometric criterioncan be used to define the sparse approximation of A. Those alternativesare attractive in a multipole framework where all the entries of A are notcomputed. The geometric approach can be also used to sparsify A, withoutnoticeably deteriorating the quality of the preconditioner. This is shown inTable 3.5.5, where M2g−g is constructed by exploiting geometric information


−0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

Real axis

Imag

inar

y ax

is

Figure 3.4.15: Eigenvalue distribution for the coefficient matrixpreconditioned by using a multiple density strategy on Example 2.

in the patterns of both A and M , but choosing twice as dense a pattern for Aas for M . As suggested by Figure 3.2.4, due to the strongly localized couplingintroduced by the discretization of the integral equations, the topologicalapproach can also provide a good sparse approximation of A, by retainingjust a few levels of neighbouring edges for each DOF in the mesh. Thenumerical behaviour of this approach is illustrated in Table 3.5.6. In bothcases the resulting preconditioner is still robust and better suited for a fastmultipole framework since it does not require knowledge of the location ofthe largest entries in A.

M2g−g

ExampleGMRES(m) Bi -

CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

1 165 103 75 60 60 66 71 61

2 145 110 95 76 76 68 140 64

3 129 89 70 57 57 49 69 52

4 71 57 48 48 48 38 52 34

5 110 46 42 42 42 24 50 27

Table 3.5.5: Number of iterations to solve the set of test models by usinga multiple density geometric strategy to construct the preconditioner.The pattern imposed on M is twice as dense as that imposed on A.


M2t−g

ExampleGMRES(m) Bi -

CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

1 197 87 49 49 49 50 66 50

2 103 82 72 61 61 49 111 50

3 143 98 84 60 60 56 70 53

4 70 58 49 49 49 39 65 37

5 143 50 47 47 47 29 57 28

Table 3.5.6: Number of iterations to solve the set of test models byusing a topological strategy to sparsify A and a geometric strategy forthe preconditioner. The pattern imposed on M is twice as dense as thatimposed on A.

Chapter 4

Symmetric Frobenius-normminimization preconditionersin electromagnetism

In the previous chapter we have introduced and compared somestrategies to compute a priori the nonzero sparsity pattern for Frobenius-norm minimization preconditioners in electromagnetic applications. Theresults of the numerical experiments suggest that using additional geometricinformation from the underlying mesh, it is possible to construct very sparsepreconditioners and to make them more robust. In this chapter, we illustratethe numerical and computational efficiency of the proposed preconditioner.In Section 4.1, we assess the effectiveness of the sparse approximate inversecompared with standard methods for the solution of a set of model problemsthat are representative of real electromagnetic calculation. In Section 4.2,we complete the study considering two symmetric preconditioners based onFrobenius-norm minimization.

4.1 Comparison with standard preconditioners

In this section we want to assess the performance of the proposedFrobenius-norm minimization approach. In Table 4.1.1, we showthe numerical results observed on Examples 1-5 with some standardpreconditioners, of both explicit and implicit form. These are: diagonalscaling, SSOR, ILU(0), SPAI and SLU applied to a sparse approximationof A constructed using the algebraic approach. All these preconditioners,except SLU, exhibit much poorer acceleration capabilities than thatprovided by M2a−g. If we reduce the density of the preconditioner inExample 1 and 3, M2a−g converges slowly but becomes the most efficient.

77

78 4. Symmetric Frobenius-norm minimization preconditioners ...

It should also be noted that SPAI works reasonably well when computedusing dense A (see Table 3.2.1) but with sparse A it does not converge onExample 2 (see Table 4.1.1). In addition, following [35], we performed somenumerical experiments where we obtained an approximate m•j from (2.2.9)by dropping the smallest entries of the iterates computed by few stepsof either the Minimum Residual method or GMRES. Unfortunately, theperformance of these approaches for dynamically defining the pattern of thepreconditioner was disappointing. They only improved the unpreconditionedcase when a relative large number of iterations was used to build thepreconditioner making them unaffordable for our problems.

The purpose of this study is to understand the numerical behaviour ofthe preconditioners. Nevertheless, we do recognize that some of the simplestrategies have a much lower cost for building the preconditioner and so couldresult in a faster solution. When SSOR converges, it is often the fastest, interms of the CPU time for the overall solution of the linear system. Whenthe solution is performed for only one right-hand side, the construction costof the other preconditioners cannot be compensated for by the reductionin the number of iterations; the matrix-vector product is performed usingBLAS kernels that make the iteration cost quite cheap for the problem sizeswe have considered. For instance, when solving Example 1 with GMRES(50)on a SUN Enterprise, SSOR converges in 31.4 seconds and M2a−g requires190 seconds for the construction and 7.6 seconds for the iterations. However,in electromagnetism applications, the same linear system has to be solvedwith many right-hand sides when illuminating an object with various wavescorresponding to different angles of incidence. For that example, if we havemore than eight right-hand sides, the construction of M2a−g is overcomeby the time saved in the iterations and M2a−g becomes more efficient thanSSOR. In addition, the construction and the application of M2a−g is fullyparallelizable while the parallelization of SSOR requires some reordering ofequations that may be difficult to implement efficiently on a distributedmemory platform.

4.1. Comparison with standard preconditioners 79



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Mj - - 465 222 174 239 210 169

SSOR - - 216 136 98 147 177 135

ILU(0) - - - - - - 479 -

SPAI - - 192 68 68 150 83 94

SLU 160 53 38 38 38 46 50 39

M2a−g 131 79 52 51 51 59 65 44



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Mj - - 473 330 243 257 354 228

SSOR - 413 245 164 134 185 281 266

ILU(0) - - - - 322 385 394 439

SPAI - - - - - - - -

SLU - - - - 282 - - -

M2a−g 100 73 61 55 55 48 93 40



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Mj - - - 491 427 375 356 306

SSOR - 500 397 301 226 228 246 199

ILU(0) - - - 474 185 - 388 -

SPAI - - - 157 89 198 119 122

SLU 36 25 25 25 25 14 27 19

M2a−g 122 84 72 59 59 53 67 50



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Mj

SSOR 360 185 137 112 93 94 124 84

ILU(0) - 359 280 202 127 203 179 136

SPAI 99 78 59 55 55 49 72 53

SLU 99 78 59 55 55 49 72 53

M2a−g 71 60 47 47 47 43 61 41

Continued on next page


Continued from previous page



CGStabUQMR TFQMR

m=10 m=30 m=50 m=80 m=110

Mj

SSOR - 296 194 145 115 161 168 124

ILU(0) - - - - 414 345 389 272

SPAI - 454 196 124 91 118 118 96

SLU 115 68 52 52 52 29 59 42

M2a−g 88 42 39 39 39 21 45 25

Table 4.1.1: Number of iterations with some standard preconditionerscomputed using sparse A (algebraic).

4.2 Symmetrization strategies for Frobenius-normminimization method

The linear systems arising from the discretization by BEM can be symmetricnon-Hermitian in the Electric Field Integral Equation formulation (EFIE),or unsymmetric in the Combined Field Integral Equation formulation(CFIE). In this thesis, as mentioned in the previous chapters, we willonly consider cases where the matrix is symmetric because EFIE usuallygives rise to linear systems that are more difficult to solve with iterativemethods. Another motivation to focus only on the EFIE formulation isthat it does not require any restriction on the geometry of the scatteringobstacle as CFIE does and in this respect is more general. However, thesparse approximate inverse computed by the Frobenius-norm minimizationmethod is not guaranteed to be symmetric, and usually is not, even if asymmetric pattern is imposed on M , and consequently it might not fullyexploit all the characteristics of the linear system. This fact prevents theuse of symmetric Krylov solvers. To complete the earlier studies, in thissection we consider two possible symmetrization strategies for Frobenius-norm minimization using a prescribed pattern for the preconditioner basedon geometric information. As before, all the preconditioners are computedusing as input A, a sparse approximation of the dense coefficient matrix A.

If MFrob denotes the unsymmetric matrix resulting from theminimization (2.2.9), the first strategy simply averages its off-diagonalentries. That is

MAver−Frob =MFrob + MT

Frob

2. (4.2.1)

An alternative way to construct a symmetric sparse approximate inverseis to only compute the lower triangular part, including the diagonal, of the

4.2. Symmetrization strategies for Frobenius-norm minimization ... 81

preconditioner. The nonzeros calculated are reflected with respect to thediagonal and are used to update the right-hand sides of the subsequent least-squares problems involved in the construction of the remaining columns ofthe preconditioner. More precisely, in the computation of the k-th column ofthe preconditioner, the entries mik for i < k are set to mki that are alreadyavailable and only the lower diagonal entries are computed. The entries mki

are then used to update the right-hand sides of the least-squares problemswhich involve the remaining unknowns mik, for k ≥ i. The least-squaresproblems are as follows:

min‖ej − Am•j‖22 (4.2.2)

where ej = ej −∑

k<j a•kmkj , m•j = (0, ..., 0, mjj , ..., mnj)T . In thefollowing, this preconditioner is referred to as MSym−Frob. It should benoted that the preconditioner built using this approach no longer minimizesany Frobenius norm and it might be sensitive to the ordering of columns.In addition, if m denotes the number of nonzero entries in MSym−Frob, thismethods only computes (m+n)/2 nonzeros. Thus the overall computationalcomplexity for the construction of MSym−Frob can be considerably smallerthan for MAver−Frob as the least-squares problems are usually solved by QRfactorizations whose complexity is of the order of the square in the numberof unknowns and linear in the number of equations.

To study the numerical behaviour of these preconditioners, we considerthe same set of test examples considered for the experiments withunsymmetric preconditioners. We recall that, for physical consistency, wehave set the frequency of the incident wave for all the examples so thatthere are about ten discretization points per wavelength. We investigate thebehaviour of the preconditioners when used to accelerate restarted GMRES,amongst unsymmetric solvers, and symmetric QMR, denoted by SQMRin the forthcoming tables, amongst symmetric solvers. As in the previoustests, the stopping criterion in all cases just consists in reducing the originalresidual by 10−5 that then can be related to a normwise backward error.In all the tables, the symbol ’-’ means that convergence is not obtainedafter 500 iterations. All the numerical experiments are performed in doubleprecision complex arithmetic on a SGI Origin 2000 and the number ofiterations reported in this section are for right preconditioning. The numberof iterations for both GMRES and SQMR that actually also corresponds tothe number of matrix-vector products that is the most time consuming partof the algorithms. Nevertheless, it should be noted that for the other parts ofthe algorithms the coupled two term recurrences of SQMR are much cheaperthan the orthogonalization and least-squares solution involved in GMRES.From a memory point of view, SQMR is also much less demanding; if weused the same memory workspace for GMRES as for SQMR, the largestrestart would be 5.

In Table 4.2.2, we show the numerical behaviour of the different


Frobenius-norm minimization type preconditioners, both symmetric andunsymmetric. In the following we consider a geometric approach to definethe sparsity pattern for A, as it is the only one that can be efficientlyimplemented in a parallel fast multipole environment [23]. We compare theunsymmetric preconditioner MFrob and the two symmetric preconditionersMAver−Frob and MSym−Frob. The column entitled “Relative Flops” displays

the ratioσQR(M)

σQR(MFrob), where the σQR(M) represents the number of floating-

point operations required by the sequence of QR factorizations used to buildthe preconditioner M , that is either M = MAver−Frob or M = MSym−Frob.In this table, it can be seen that MAver−Frob almost always requiresless iterations than MSym−Frob that imposes the symmetry directly andconsequently only computes half of the entries. Since MSym−Frob computesless entries the associated values in the column “Relative Flops” are allless than one and close to a third in all cases. On the hardest test cases(Examples 1 and 3), the combination SQMR and MAver−Frob needs lessthan half the number of iterations of MFrob with GMRES(30) and is onlyvery slightly less efficient than MFrob and GMRES(80). On the less difficultproblems, SQMR plus MAver−Frob converges between 21 and 37% faster thanGMRES(80) plus MFrob and between 31 and 43% faster than GMRES(30)plus MFrob. MSym−Frob, that only computes half of the entries of thepreconditioner, has a poor convergence behaviour on the hardest problemsand is slightly less efficient than MAver−Frob on the other problems whenused with SQMR. Nevertheless, we should mention that, for the sake ofcomparison, those preliminary results have been performed using the set ofparameters for the density of A and M that were the best for MFrob andconsequently nearly optimal for MAver−Frob; the performance of MSym−Frob

might be improved as shown by the results depicted in Table 4.2.3. Thesefirst experiments reveal the remarkable robustness of SQMR when used incombination with a symmetric preconditioner. This combination generallyoutperforms GMRES even for large restarts.

The best alternative for significantly improving the behaviour ofMSym−Frob is to enlarge significantly the density of A and only marginallyincrease the density of the preconditioner. In Table 4.2.3, we show thenumber of iterations observed with this strategy that consists in using adensity of A that is three times larger than that for MSym−Frob; we recallthat for MAver−Frob and MFrob a density of A twice as large as that ofthe preconditioner is usually the best trade-off between computing cost andnumerical efficiency. It can be seen that MSym−Frob is slightly better thanMAver−Frob (as in Table 4.2.2) but it is less expensive to build. In thistable, we consider the same values for σQR(MFrob) as those in Table 4.2.2to evaluate the ratio “Relative Flops”.


Example 1 - Density of A = 10.13% - Density of M = 5.03%

Precond. GMRES(30) GMRES(80) GMRES(∞) SQMR Relative Flops

MFrob 108 60 60 * 1.00MAver−Frob 171 79 79 74 1.00MSym−Frob – – 301 – 0.25



MFrob 57 43 43 * 1.00MAver−Frob 59 44 44 34 1.00MSym−Frob 60 46 39 41 0.28










Table 4.2.2: Number of iterations on the test examples using the samepattern for the preconditioners.

Example DensityGMRES(m)

SQMR Relative Flopsm=30 m=80 m=∞

1 A =11.98% 172 68 68 67 0.40M = 6.10 %

2 A = 5.94% 56 41 41 33 0.30M = 2.04 %

3 A =11.01% 88 57 57 56 0.66M = 3.14 %

4 A = 2.08% 56 50 50 32 0.47M = 1.19 %

5 A = 1.98% 33 33 33 15 0.34M = 0.62 %

Table 4.2.3: Number of iterations for MSym−Frob combined with SQMRusing three times more non-zero in A than in the preconditioner.


To illustrate the effect of the densities of A and of the preconditioners,we performed experiments with preconditioned SQMR, where thepreconditioners are built by using either the same sparsity pattern for Aor a two, three or five times denser pattern for A. We report in Tables 4.2.4and 4.2.5 respectively the number of SQMR iterations for MSym−Frob, andfor MAver−Frob respectively. In these tables, MSym−Frob always requiresmore iterations than MAver−Frob for the same values of density for A andfor the preconditioner, but its computation costs about a quarter of the flopsfor each test.

Example 1

Percentage density of MDensity strategy

1 2 3 4 5 6 7 8 9 10

Same – – – – – 180 150 118 105 55

2.0 times – – – – – 67 56 48 91 42

3.0 times – – – – 393 55 52 47 74 39

5.0 times – – – – 346 53 50 45 56 39

Table 4.2.4: Number of iterations of SQMR with MSym−Frob withdifferent values for the density of M , using the same pattern for A andlarger patterns. The test problem is Example 1.

Example 1


1 2 3 4 5 6 7 8 9 10

Same – – – 336 78 55 55 45 38 40

2.0 times – – 426 105 81 50 48 43 43 44

3.0 times – 426 293 113 92 49 45 36 35 35

5.0 times – 315 248 114 80 44 38 37 37 35

Table 4.2.5: Number of iterations of SQMR with MAver−Frob withdifferent values for the density of M , using the same pattern for A andlarger patterns. The test problem is Example 1.

Because the construction of MSym−Frob is dependent on the orderingselected, a natural question concerns the sensitivity of the quality of thepreconditioner to this. In particular, in [54] it is shown that the numericalbehaviour of IC is very dependent on the ordering and a similar study andcomparable conclusion with AINV is described in [17]. In Table 4.2.6, wedisplay the number of iterations with SQMR, selecting the same density


parameters as those used for the experiments reported in Table 4.2.5, butusing different orderings to permute the original pattern of MSym−Frob.More precisely we consider the reverse Cuthil-MacKee ordering [37] (RCM),the minimum degree [71, 141] ordering (MD), the spectral nested dissectionordering [114] (SND) and lastly we reorder the matrix by putting the denserrows and columns first (DF). It can be seen that MSym−Frob is not toosensitive to the ordering and none of the tested orderings appears superiorto the others.

Example Density Original RCM MD SND DF

1 A =11.98% 67 93 93 75 87M = 6.10 %

2 A = 5.94% 33 41 40 40 44M = 2.04 %

3 A =11.01% 56 51 68 73 77M = 3.14 %

4 A = 2.08% 32 42 40 39 39M = 1.19 %

5 A = 1.98% 15 26 25 26 23M = 0.62 %

Table 4.2.6: Number of iterations of SQMR with MSym−Frob withdifferent orderings.

For comparison, in Table 4.2.7, we report on comparative resultsamongst different Frobenius-norm minimization type preconditioners, bothsymmetric and unsymmetric, obtained when the algebraic dropping strategyis used to sparsify the coefficient matrix. In this case, MAver−Frob alwaysperforms better than MSym−Frob but is at least three times more expensiveto compute. On Examples 1 and 3, the hardest test cases, the combinationSQMR and MAver−Frob needs up to 65% more iterations than GMRES(80)plus MFrob but competes with GMRES(30) plus MFrob. On the less difficultproblems, SQMR plus MAver−Frob converges between 18 and 35% faster thanGMRES(80) plus MFrob and between 20 and 47% faster than GMRES(30)plus MFrob. The best alternative to significantly improve the behaviour ofMSym−Frob remains to enlarge notably the density of A and only marginallythe density of the preconditioner. This can be observed in Table 4.2.8 wherewe show the number of iterations observed with this strategy that consistsin using a density of A that is at most three times larger than that ofMSym−Frob. Once again the behaviour of MSym−Frob is comparable to thatof MAver−Frob described in Table 4.2.7 but is less expensive to build.

In Tables 4.2.9 and 4.2.10 we illustrate the effect of the density of theapproximation of the original matrix and of the preconditioners on the


convergence of SQMR. The preconditioners are built by using either thesame sparsity pattern for A or a two, three or five times denser pattern forA. We report in Tables 4.2.9 and Table 4.2.10, respectively, the numberof iterations of SQMR iterations when an algebraic approach is used forA and a geometric approach is selected for MSym−Frob and MAver−Frob,respectively. If we compare these results with those reported in Table 4.2.4,it can be seen that, on hard problems, using geometric information evento prescribe the pattern of A is beneficial. MSym−Frob remains ratherinsensitive to the ordering as shown in the results of Table 4.2.11.


Precond. GMRES(30) GMRES(80) GMRES(110) SQMR Relative Flops

MFrob 79 51 51 * 1.00MAver−Frob 196 119 90 84 1.00MSym−Frob – – – – 0.25






MFrob 84 59 59 * 1.00MAver−Frob 119 74 74 74 1.00MSym−Frob – – – – 0.29







Table 4.2.7: Number of iterations on the test examples using the samepattern for the preconditioners. An algebraic pattern is used to sparsifyA.


Example DensityGMRES(m)

SQMR Relative Flopsm=30 m=80 m=∞

1 A =12% 360 79 79 79 0.41M = 6%

2 A = 5.97% 59 43 43 34 0.57M = 2.04 %

3 A =11.08% 171 76 76 78 0.66M = 3.14 %

4 A = 2.10% 51 44 44 31 0.47M = 1.19 %

5 A = 1.87% 33 33 33 14 0.34M = 0.62 %

Table 4.2.8: Number of iterations MSym−Frob combined with SQMRusing three times more non-zero in A than in the preconditioner. Analgebraic pattern is used to sparsify A.

Example 1


1 2 3 4 5 6 7 8 9 10

Same – – – – – – 494 364 440 90

2.0 times – – – – – 79 173 105 81 58

3.0 times – – – – – 64 66 71 45 55

5.0 times – – – – 346 52 70 56 40 41

Table 4.2.9: Number of iterations of SQMR with MSym−Frob withdifferent values for the density of M , using the same pattern for Aand larger patterns. A geometric approach is adopted to construct thepattern for the preconditioner and an algebraic approach is adopted toconstruct the pattern for the coefficient matrix. The test problem isExample 1.


Example 1


1 2 3 4 5 6 7 8 9 10

Same 391 – 433 99 89 48 50 38 36 36

2.0 times – 420 272 112 84 44 37 36 33 34

3.0 times 362 363 222 96 86 40 43 36 36 35

5.0 times – 365 251 100 76 40 38 34 35 36

Table 4.2.10: Number of iterations of SQMR with MAver−Frob withdifferent values for the density of M , using the same pattern for Aand larger patterns. A geometric approach is adopted to construct thepattern for the preconditioner and an algebraic approach is adopted toconstruct the pattern for the coefficient matrix. The test problem isExample 1.

Example Density Original RCM MD SND DF

1 A =12% 79 72 70 71 76M = 6%

2 A = 5.97% 34 39 39 35 39M = 2.04 %

3 A =11.08% 78 122 92 112 122M = 3.14 %

4 A = 2.10% 31 29 30 30 27M = 1.19 %

5 A = 1.87% 14 27 24 26 14M = 0.62 %

Table 4.2.11: Number of iterations of SQMR with MSym−Frob withdifferent ordering. An algebraic pattern is used to sparsify A.


In this chapter we have assessed the performance of the Frobenius-norm minimization preconditioner in the solution of dense complexsymmetric non-Hermitian systems of equations arising from electromagneticapplications. The set of problems used for the numerical experiments canbe representative of larger systems. We have also investigated the use ofsymmetric preconditioners which reflect the symmetry of the original matrixin the associated preconditioner, and enable us to use a symmetric Krylovsolver that might be cheaper than GMRES iterations. Both MAver−Frob

and MSym−Frob appear to be efficient and robust. Through numerical


experiments, we have shown that MSym−Frob was not too sensitive to columnordering while MAver−Frob is totally insensitive. In addition MAver−Frob

is straightforward to parallelize even though it requires more flops for itsconstruction. It would probably be the preconditioner of choice in a paralleldistributed fast multipole environment but possibilities for parallelizingMSym−Frob also exist, by using colouring techniques to detect independentsubsets of columns that can be computed in parallel. In a multipole contextthe algorithm must be recast by blocks, and Level 2 BLAS operations haveto be used for the least-squares updates. Finally, the major benefit of thesetwo preconditioners is the remarkable robustness they exhibit when used inconjunction with SQMR.

Chapter 5

Combining fast multipoletechniques and approximateinverse preconditioners forlarge parallelelectromagneticscalculations.

In this chapter we consider the implementation of the Frobenius-normminimization preconditioner described in Chapter 3 within a code thatimplements the Fast Multipole Method (FMM). We combine the sparseapproximate inverse preconditioner with fast multipole techniques for thesolution of huge electromagnetic problems. The chapter is organized asfollows: in Section 5.1 we quickly overview the FMM. In Section 5.2we describe the implementation of the Frobenius-norm minimizationpreconditioner in a parallel and multipole context that has been developedby [135]. In Section 5.3 we study the numerical and parallel scalability of theimplementation for the solution of large problems. Finally, in Section 5.4 weinvestigate the numerical behaviour of inner-outer iterative solution schemesimplemented in a multipole context with different levels of accuracy forthe matrix-vector products in the inner and outer loops. We consider inparticular FGMRES as the outer solver with an inner GMRES iterationpreconditioned by the Frobenius-norm minimization method. We illustratethe robustness and effectiveness of this scheme for the solution of problemswith up to one million unknowns.

91

92 5. Combining fast multipole techniques and approximate inverse ...

5.1 The fast multipole method

The FMM, introduced by Greengard and Rokhlin [82], providesan algorithm for computing approximate matrix-vector products forelectromagnetic scattering problems. The method is fast in the sense thatthe computation of one matrix-vector product costs O(n log n) arithmeticoperations instead of the usual O(n2) operations, and is approximate inthe sense that the relative error with respect to the exact computation isaround 10−3 [38, 135]. It is based on truncated series expansions of theGreen’s function for the electric-field integral equation (EFIE). The EFIEcan be written as

E(x) = −∫

Γ∇G(x, x′)ρ(x′)d3x′− ik

c

∫

ΓG(x, x′)J(x′)d3x′+EE(x), (5.1.1)

where EE is the electric field due to external sources, J(x) is the currentdensity, ρ(x) is the charge density and the constants k and c are thewavenumber and the speed of light, respectively. The Green’s function Gcan be expressed as

G(x, x′) =e−ik|x−x′|

|x− x′| . (5.1.2)

The EFIE is converted into matrix equations by the Method of Moments [86].The unknown current J(x) on the surface of the object is expanded into aset of basis functions Bi, i = 1, 2, ..., N

J(x) =N∑

i=1

JiBi(x).

This expansion is introduced in (5.1.1), and the discretized equation isapplied to a set of test functions. A linear system is finally obtained. Theentries in the coefficient matrix of the system are expressed in terms ofsurface integrals, and have the form

AKL =∫ ∫

G(x, y)BK(x) ·BL(y)dL(y)dK(x). (5.1.3)

When m-point Gauss quadrature formulae are used to compute the surfaceintegrals in (5.1.3), the entries of the coefficient matrix assume the form

AKL =m∑

i=1

m∑

j=1

ωiωjG(xKi , yLj )BK(xKi) ·BL(yLj ). (5.1.4)

Single and multilevel variants of the FMM exist and, for the multilevelalgorithm, there are adaptive variants that handle efficiently inhomogeneous

5.1. The fast multipole method 93

discretizations. In the one-level algorithm, the 3D obstacle is entirelyenclosed in a large rectangular domain, and the domain is divided into eightboxes (four in 2D). Each box is recursively divided until the length of theedges of the boxes of the current level is small enough compared with thewavelength. The neighbourhood of a box is defined by the box itself and its26 adjacent neighbours (eight in 2D). The interactions of degrees of freedomwithin nearby boxes are computed exactly from (5.1.4), where the Green’sfunction is expressed via (5.1.2). The contributions of far away cubes arecomputed approximately. For each far away box, the effect of a large numberof degrees of freedom is concentrated into one multipole coefficient, that iscomputed using truncated series expansion of the Green’s function

G(x, y) =P∑

p=1

ψp(x)φp(y). (5.1.5)

The expansion (5.1.5) separates the Green’s function into two sets of terms,ψi and φi, that depend on the observation point x and the source (orevaluation) point y, respectively. In (5.1.5) the origin of the expansion is nearthe source point and the observation point x is far away. Local coefficientsfor the observation cubes are computed by summing together multipolecoefficients of far-away boxes, and the total effect of the far field on eachobservation point is evaluated from the local expansions (see Figure 5.1.1for a 2D illustration). Local and multipole coefficients can be computed in apreprocessing step; the approximate computation of the far field enables usto reduce the computational cost of the matrix-vector product to O(n3/2)in the basic one-level algorithm.

In the hierarchical multilevel algorithm, the obstacle is enclosed in acube, the cube is divided into eight subcubes and each subcube is recursivelydivided until the size of the smallest box is generally half of a wavelength.Tree-structured data is used at all levels. In particular only non-emptycubes are indexed and recorded in the data structure. The resultingtree is called an oct-tree (see Figure 5.1.2) and we refer to its leaves asthe leaf-boxes. The oct-tree provides a hierarchical representation of thecomputational domain partitioned by boxes. Each box has one parent inthe oct-tree, except for the largest cube which encloses the whole domain,and up to eight children. Obviously, the leaf-boxes have no children.Multipole coefficients are computed for all cubes in the lowest level of theoct-tree, that is for the leaf-boxes. Multipole coefficients of the parentcubes in the hierarchy are computed by summing together contributionsfrom the multipole coefficients of their children. The process is repeatedrecursively until the coarsest possible level. For each observation cube,an interaction list is defined that consists of those cubes that are notneighbours of the cube itself but whose parent is a neighbour of the cube’sparent. In Figure 5.1.3 we denote by dashed lines the interaction list


for the observation cube in the 2D case. The interactions of degreesof freedom within neighbouring boxes are computed exactly, while theinteractions between cubes in the interaction list are computed using theFMM. All the other interactions are computed hierarchically on a coarserlevel traversing the oct-tree. Both the computational cost and the memoryrequirement of the algorithm are of order O(n log n). For further detailson the algorithmic steps see [39, 115, 124] and [38, 44, 45, 46] for recenttheoretical investigations. Parallel implementations of hierarchical methodshave been described in [78, 79, 80, 81, 126, 149].

Figure 5.1.1: Interactions in the one-level FMM. For each leaf-box, theinteractions with the gray neighbouring leaf-boxes are computed directly.The contribution of far away cubes are computed approximately. Themultipole expansions of far away boxes are translated to local expansionsfor the leaf-box; these contributions are summed together and the total fieldinduced by far away cubes is evaluated from local expansions.

5.2 Implementation of the Frobenius-normminimization preconditioner in the fastmultipole framework

An efficient implementation of the Frobenius-norm minimizationpreconditioner in the FMM context exploits the box-wise partitioning ofthe domain. The subdivision into boxes of the computational domain uses

5.2. Implementation of the Frobenius-norm minimization preconditioner ...95

Figure 5.1.2: The oct-tree in the FMM algorithm. The maximum numberof children is eight. The actual number corresponds to the subset of eightthat intersect the object (courtesy of G. Sylvand, INRIA CERMICS).

geometric information from the obstacle, that is the spatial coordinates ofits degrees of freedom. As we know from Chapter 3, this information canbe profitably used to compute an effective a priori sparsity pattern for theapproximate inverse. In the FMM implementation, we adopt the followingcriterion: the nonzero structure of each column of the preconditioner isdefined by retaining all the edges within a given leaf-box and those inone level of neighbouring boxes. We recall that the neighbourhood of abox is defined by the box itself and its 26 adjacent neighbours (eight in2D). The sparse approximation of the dense coefficient matrix is defined byretaining the entries associated with edges included in the given leaf-box aswell as those belonging to the two levels of neighbours. The actual entriesof the approximate inverse are computed column by column by solvingindependent least-squares problems. The main advantage of defining thepattern of the preconditioner and of the original sparsified matrix box-wiseis that we only have to compute a QR factorization per leaf-box. Indeedthe least-squares problems corresponding to edges within the same box areidentical because they are defined using the same nonzero structure and thesame entries of A. It means that the QR factorization can be performedonce and reused many times, improving significantly the efficiency of thecomputation. The preconditioner has a sparse block structure; each block isdense and is associated with one leaf-box. Its construction can use a differentpartitioning from that used to approximate the dense coefficient matrix andrepresented by the oct-tree. The size of the smallest boxes in the partitioning


Figure 5.1.3: Interactions in the multilevel FMM. The interactions for thegray boxes are computed directly. We denote by dashed lines the interactionlist for the observation box, that consists of those cubes that are notneighbours of the cube itself but whose parent is a neighbour of the cube’sparent. The interactions of the cubes in the list are computed using theFMM. All the other interactions are computed hierarchically on a coarserlevel, denoted by solid lines.

associated with the preconditioner is a user-defined parameter that can betuned to control the number of nonzeros computed per row, that is thedensity of the preconditioner. According to our criterion, the larger the sizeof the leaf-boxes, the larger the geometric neighbourhood that determinesthe sparsity structure of the columns of the preconditioner. Parallelism canbe exploited by assigning disjoint subsets of leaf-boxes to different processorsand performing the least-squares solutions independently on each processor.Communication is required to get information on the entries of the coefficientmatrix from neighbouring leaf-boxes.

5.3 Numerical scalability of the preconditioner

In this section we show results concerning the numerical scalabilityof the Frobenius-norm minimization preconditioner. They have beenobtained by increasing the value of the frequency and illuminating the sameobstacle. The surface of the object is always discretized using ten pointsper wavelength. We consider two test examples: a sphere of radius 1 metre

5.3. Numerical scalability of the preconditioner 97

and an Airbus aircraft (see Figure 5.3.4) that represents a real life modelproblem in an industrial context. In Table 5.3.1, we present the number of

Figure 5.3.4: Mesh associated with the Airbus aircraft (courtesy of EADS).The surface is discretized by 15784 triangles.

matrix-vector products using either GMRES(30) or TFQMR with a requiredaccuracy of 10−2 on the normwise backward error ||r||

||b|| , where r denotesthe residual and b the right-hand side of the linear system associated withthe experiments on the sphere. This tolerance is accurate for engineeringpurposes, as it enables us to detect correctly the radar cross section of theobject. The symbol ‘–’ means no convergence after 1500 iterations. InTable 5.3.2, we show the number of iterations and the parallel elapsed timeto build the preconditioner and to solve the linear system when its size isincreased. Similar information is reported for the experiments on the Airbusaircraft in Tables 5.3.3 and 5.3.4. All the runs have been performed in singleprecision on eight processors of a Compaq Alpha server. The Compaq Alphaserver is a cluster of Symmetric Multi-Processors. Each node consists offour Alpha processors that share 512 Mb of memory. On that computer the


temporary disk space that can be used by the out-of-core solver is around189 Gb.

Size of thelinear system

Density of thepreconditioner

FrequencyGHz

radius/λ GMRES(30) TFQMR

40368 1.16% 0.9 3 99 15271148 0.33% 1.2 4 83 171

112908 0.21% 1.5 5 96 134161472 0.15% 1.8 6 96 654221952 0.11% 2.1 7 438 —288300 0.08% 2.4 8 348 —549552 0.04% 3.3 11 532 —

1023168 0.02% 4.5 15 1196 —Table 5.3.1: Total number of matrix-vector products requiredto converge on a sphere on problems of increasing size -tolerance = 10−2. The size of the leaf-boxes in the oct-treeassociated with the preconditioner is 0.125 wavelengths.


GMRES(30)Disk memory used

MbytesConstruction

timeSolution

time71148 83 16.5 13 mins 3 mins

161472 96 37.8 30 mins 8 mins288300 348 67.9 55 mins 1 hour549552 532 129.7 1 h 45 mins 4 hours

1023168 1196 243.5 3 h 10 mins 1 dayTable 5.3.2: Elapsed time required to build thepreconditioner and by GMRES(30) to converge on a sphereon problems of increasing size on eight processors on aCompaq Alpha server - tolerance = 10−2.



FrequencyGHz

GMRES(30) TFQMR

23676 2.3 61 –94704 4.6 101 –

213084 6.9 225 –378816 9.2 – –591900 11.4 – –

1160124 16.1 – –Table 5.3.3: Total number of matrix-vector products requiredto converge on an aircraft on problems of increasing size -tolerance = 2 · 10−2.


GMRES(30)Disk memory used

MbytesConstruction

timeSolution

time23676 61 5.7 4 mins 3 mins94704 101 26.3 26 mins 13 mins

213084 225 63.7 54 mins 47 mins591900 – 169.9 2 h 30 mins –

1160124 – 338.8 3 h 15 mins –Table 5.3.4: Elapsed time required to build thepreconditioner and by GMRES(30) to converge on an aircrafton problems of increasing size on eight procs on a CompaqAlpha server - tolerance = 2 · 10−2.

The number of iterations and the computational cost grow rapidly withthe problem size. On the sphere, the number of iterations required byGMRES(30) is nearly constant for small problems, but increases linearlyfor larger problems. The solution by GMRES(30) of a scattering problemat 3.3 GHz frequency, discretized with half a million points, requires 532matrix-vector products and four hours of computation time to solve theassociated linear system. Nearly one day of computation is necessaryto solve the same problem at a frequency of 4.5 GHz. In this case thepertinent matrix has one million unknowns, and GMRES(30) requires 1196iterations to converge. Compared to GMRES, TFQMR exhibits a very poorconvergence behaviour. It never converges in less than 1500 matrix-vectorproducts on systems with more than two hundred thousand unknowns. TheAirbus aircraft is even more challenging to solve. On the smallest problem,


of size 23676, neither GMRES(30) nor TFQMR converge to 10−2 in 1500iterations, and 115 iterations are required by full GMRES. On a largertest, of size 94704, stagnation occurs for small or medium restarts and fullGMRES requires 625 iterations. In Figure 5.3.7, we show for this problemthe normwise backward error after 1500 iterations of GMRES for differentvalues of restart, when stagnation appears. Convergence to 10−2 is achievedonly when a very large restart (around 500) is selected. However, for largeproblems this choice might not be affordable because it is too demandingin terms of storage requirements. If we reduce the required accuracy to2 · 10−2 the convergence becomes obviously easier to achieve in a reasonableelapsed time and using an affordable memory cost, at least for medium-sizeproblems. As can be observed in Table 5.3.3, GMRES(30) converges inless than 1500 iterations on problems of size up to two hundred thousandunknowns. Although this tolerance may seem artificial, we checked at theend of the computation that the radar cross section of the obstacle wasaccurately determined. In Figures 5.3.5 and 5.3.6 we show the typical curvesof the radar cross section for an Airbus aircraft discretized with 200000unknowns. The quantity reported on the ordinate axis indicates the valueof the energy radiated back at different incidence angles. The RCS curvedepicted in Figure 5.3.5 is obtained when we require an accuracy of 2·10−2 onthe normwise backward error in the solution of the linear system. The RCScurve depicted in Figure 5.3.6 is obtained using another integral formulation,the CFIE, that is better conditioned and simpler to solve, and requiring anaccuracy of 10−6 on the normwise backward error in the iterative solution.The CFIE formulation is less general than EFIE but can be used on closedtargets like the Airbus aircraft. It can be observed that in both figuresthe pics are equally well approximated. Thus, for engineering purposes thesolution is still meaningful and can be exploited in the design process. Weuse this tolerance for the remaining numerical experiments on the Airbusaircraft.

In Table 5.3.5, we investigate the influence of the density on the qualityof the preconditioner on the aircraft. We adopt the same criterion describedin Section 5.2 to define the sparsity patterns but we increase the size ofthe leaf-boxes in the oct-tree associated with the preconditioner. The besttrade-off between cost and performance is obtained for 0.125 wavelengths,that is the default value set in the code. If the preconditioner is reused tosolve systems with the same coefficient matrix and multiple right-hand sides,it might be worth computing more nonzeros because the construction costcan be quickly amortized. If the size of the leaf-boxes is large enough, thepreconditioner is very effective in reducing the number of GMRES iterations;for values smaller than 0.1 wavelengths the preconditioner is very sparse andquite poor. For values larger than 0.2 the memory requirements exceed thelimits of our machine.

Finally, in Table 5.3.6, we show the parallel scalability of the


Figure 5.3.5: The RCS curve for an Airbus aircraft discretized with 200000unknowns. The problem is formulated using the EFIE formulation and atolerance of 2 · 10−2 in the iterative solution. The quantity reported on theordinate axis indicates the value of the energy radiated back at differentincidence angles.

Figure 5.3.6: The RCS curve for an Airbus aircraft discretized with 200000unknowns. The problem is formulated using the CFIE formulation and atolerance of ·10−6 in the iterative solution. The quantity reported on theordinate axis indicates the value of the energy radiated back at differentincidence angles.

implementation of the preconditioner in the FMM code [135]. We solveproblems of increasing size on a larger number of processors, keeping thenumber of unknowns per processor constant. We refer to [135] for a completedescription of the parallel code that we used.


0 50 100 150 200 250 300 350 400 450 500

0.01

0.012

0.014

0.016

0.018

0.02

0.022

0.024

0.026

0.028

0.03

Value of restart for GMRES

Nor

mw

ise

Bac

kwar

d E

rror

Normwise Backward Error after 1500 iterations of restarted GMRES

Figure 5.3.7: Effect of the restart parameter on GMRES stagnation on anaircraft with 94704 unknowns.

radius# nonzero

per row in M# mat-vecin GMRES

Constructiontime (sec)

Solutiontime (sec)

Overalltime (sec)

0.097 183 – 1275 – –0.110 235 472 1836 8121 99570.125 299 225 2593 2846 54390.141 372 – 4213 – –0.157 461 – 5866 – –0.176 569 278 7234 3637 108710.195 684 129 10043 1571 11614

Table 5.3.5: Elapsed time to build the preconditioner,elapsed time to solve the problem and total number ofmatrix-vector products using GMRES(30) on an aircraft with213084 unknowns - tolerance = 2 · 10−2 - eight processorsCompaq, varying the parameters controlling the density ofthe preconditioner. The symbol ’–’ means stagnation after1000 iterations.

5.4. Improving the preconditioner robustness using embedded iterations103

Problemsize

Nb procsConstruction time

(sec)Elapsed timeprecond (sec)

Elapsed timemat-vec (sec)

112908 8 513 0.39 1.77161472 12 488 0.40 1.95221952 16 497 0.43 2.15288300 20 520 0.45 2.28342732 24 523 0.47 3.10393132 28 514 0.47 3.30451632 32 509 0.48 2.80674028 48 504 0.54 3.70900912 64 514 0.60 3.80

Table 5.3.6: Tests on the parallel scalability of thecode relative to the construction and application of thepreconditioner and to the matrix-vector product operation onproblems of increasing size. The test example is the Airbusaircraft.

5.4 Improving the preconditioner robustnessusing embedded iterations

The numerical results shown in the previous section indicate that theFrobenius-norm minimization preconditioner tends to be less effective whenthe problem size increases. By its nature the sparse approximate inverse isinherently local because each degree of freedom is coupled only to a very fewneighbours. The compact support that we use to define the preconditionerdoes not allow an exchange of global information, and when the exactinverse is globally coupled this lack of global information may have a severeimpact on the quality of the preconditioner. In addition, in a multipolecontext, the density of the sparse approximate inverse tends to decrease forincreasing values of frequency because the size of the subdivision boxes getssmaller when the frequency of the problem is higher. For the solution ofproblems of large size it may be necessary to introduce some mechanismto recover global information on the numerical behaviour of the discreteGreen’s function. In this section we investigate the behaviour of inner-outer solution schemes implemented in the FMM context. We consider inparticular FGMRES [121] as the outer solver with an inner GMRES iterationpreconditioned with the Frobenius-norm minimization method. For theFGMRES method, we consider the implementation described in [64]. Themotivation that naturally leads us to consider inner-outer schemes is to try


Outer solver −→ FGMRES, FQMR

Do k=1,2, ...• M-V product: FMM with high accuracy

• Preconditioning : Inner solver (GMRES, TFQMR, ...)

Do i=1,2, ...

• M-V product: FMM with low accuracy

• Preconditioning : MFrob

End DoEnd Do

Figure 5.4.8: Inner-outer solution schemes in the FMM context. Sketch ofthe algorithm.

to balance the locality of the preconditioner with the use of the multipolematrix. The matrix-vector products within the outer and the inner solversare carried out at different accuracies. Highly accurate FMM is used withinthe outer solver that is used to actually solve the linear system, and a loweraccurate FMM within the inner solver is used as preconditioner for theouter scheme. In fact, we solve a nearby system for the preconditioningoperation. This enables us to save considerable computational effort duringthe iterative process. More precisely, the FMM accuracy is “high” for theFGMRES iteration (the relative error in the matrix-vector computation isaround 5 · 10−4 compared to the exact computation) and “medium” for theinner iteration (the relative error is around ·10−3). We present a sketch ofthe algorithm in Figure 5.4.8. One could apply this idea recursively andembed several FGMRES schemes with decreasing FMM accuracy down tothe lowest accuracy in the innermost GMRES. However, in our work we onlyconsider a two-level scheme. We will see that this is already quite effective.

Among the various possibilities, we select FGMRES(5) and GMRES(20)that seem to give the optimal trade-off, as the results reported in Table 5.4.7and 5.4.8 show for experiments on a sphere with 367500 points and an Airbusaircraft with 213084 points, respectively.


restartFGMRES

restartGMRES

max innerGMRES

total innermat-vec

total outermat-vec

Solutiontime (sec)

5 10 10 230 29 42115 10 20 231 15 35265 10 30 288 12 45445 20 20 180 12 27415 20 30 248 11 39675 20 40 246 9 3785

10 10 10 180 21 2912Table 5.4.7: Global elapsed time and total number of matrix-vector products required to converge on a sphere with 367500points varying the size of the restart parameters and themaximum number of inner GMRES iterations per FGMRESpreconditioning step - tolerance = 10−2 - eight processorsCompaq.

restartFGMRES

restartGMRES

max innerGMRES

total innermat-vec

total outermat-vec

Solutiontime (sec)

5 10 10 +200 +4000 –5 10 20 13 210 21985 10 30 12 288 31785 20 20 11 160 17685 20 30 9 186 20205 20 40 7 205 21835 30 30 9 180 1946

10 10 10 19 160 186110 20 20 10 160 182710 30 30 8 180 2123

Table 5.4.8: Global elapsed time and total number of matrix-vector products required to converge on an aircraft with213084 unknowns varying the size of the restart parametersand the maximum number of inner GMRES iterations perFGMRES preconditioning step - tolerance = 2 · 10−2 - eightprocessors Compaq.


The convergence history of GMRES depicted in Figure 5.4.9 for differentvalues of the restart gives us some clues to the numerical behaviour of theproposed scheme. The residual of GMRES tends to decrease very rapidly inthe first few iterations independently of the restarts, then decreases muchmore slowly, and finally stagnates to a value that depends on the restart;the larger the restart, the lower the stagnation value. It suggests that afew steps (up to 20) in the inner solver can be very effective for obtaining asignificant reduction of the initial residual. A different numerical behaviourhas been observed for the TFQMR solver as inner solver. The residual inthe beginning of the convergence is nearly constant or decreases very slowly.The use of this method as an inner solver is ineffective. Figure 5.4.9 alsoshows that large restarts of GMRES do not enable a further reduction ofthe normwise backward error in the beginning of convergence. Thus smallrestarts should be preferred in the inner GMRES iterations.

0 500 1000 15000.01

0.02

0.03

0.04

0.05

0.06

0.07

Number of M−V products

Nor

mw

ise

Bac

kwar

d E

rror

Convergence history of restarted GMRES for different values of restart

Restart = 10 " = 20 " = 30 " = 50 " = 80 " = 150 " = 300 " = 500

Figure 5.4.9: Convergence history of restarted GMRES for different valuesof restart on an aircraft with 94704 unknowns.

We show the results of some preliminary experiments in Tables 5.4.9and 5.4.10. We show the number of inner and outer matrix-vector productsneeded to achieve convergence on the sphere using a tolerance of 10−2 andon the Airbus aircraft using a tolerance of 2 · 10−2. We also give timings.The comparison with the results shown in Tables 5.3.1 and 5.3.3 is fairbecause GMRES(30) has exactly the same storage requirements as thecombination FGMRES(5)/GMRES(20). In fact, for the same restart value,the storage requirement for the FGMRES algorithm is twice that for thestandard GMRES algorithm, as it stores the preconditioned vectors of theKrylov basis. The combination FGMRES/GMRES remarkably enhancesthe robustness of the preconditioner on large problems. On the sphere


with 367500 points, it enables convergence in 16 outer and 252 total inneriterations whereas GMRES(30) does not converge in 1500 iterations. On thesphere with one million unknowns the elapsed time for the iterative solutionis reduced from 11 hours to 11

2 hours on 16 processors. The enhancement ofthe robustness of the preconditioner is even more significant on the Airbusaircraft as GMRES(30) does not converge on problem sizes larger thanaround 200000 unknowns. This can be observed in Table 5.4.10 and alsoin Figure 5.4.10 where we report on the normwise backward error after 100outer iterations of FGMRES for different values of restart for FGMRES.The value reported in this figure can be considered as the level of stagnationof the normwise backward error. The depicted curve can be compared to theone given in Figure 5.3.7. The normwise backward error is much smaller thanthat obtained with the standard GMRES at a comparable computationalcost. Finally, we mention that the combination FGMRES(5)/GMRES(20)does not converge on one problem, of size 378816, using a tolerance of 2·10−2.In fact, for some specific values of the frequency, resonance phenomena mayoccur in the associated physical problem, and the resulting linear systemcan become very ill-conditioned.


FGMRES(5) GMRES(20)Solution

time (sec)40368 7 105 2 mins71148 7 105 4 mins

112908 7 105 7 mins161472 9 126 13 mins221952 13 210 29 mins288300 13 210 37 mins367500 16 252 1 h 10 mins549552 17 260 1 h 50 mins

1023168 17 260 3 h 20 minsTable 5.4.9: Total number of matrix-vector products requiredto converge on a sphere on problems of increasing size -tolerance = 10−2.



FGMRES(5) GMRES(20)Solution

time (sec)23676 15 220 7 mins94704 7 100 9 mins

213084 11 160 36 mins591900 17 260 3 h 25 mins

1160124 19 300 8 h 42 minsTable 5.4.10: Total number of matrix-vector productsrequired to converge on an aircraft on problems of increasingsize - tolerance = 2 · 10−2.

0 50 100 1500.004

0.006

0.008

0.01

0.012

0.014

0.016

Value of restart for FGMRES

Nor

mw

ise

Bac

kwar

d E

rror

Normwise Backward Error after 100 iterations of restarted FGMRES

Figure 5.4.10: Effect of the restart parameter on FGMRES stagnation onan aircraft with 94704 unknowns using GMRES(20) as inner solver.


In this chapter, we have described the implementation of the Frobenius-norm minimization preconditioner within the code that implements theFast Multipole Method (FMM). We have studied the numerical and parallelscalability of the implementation for the solution of large problems, with upto one million unknowns, and we have investigated the numerical behaviourof inner-outer iterative solution schemes implemented in a multipole contextwith different levels of accuracy for the matrix-vector products in the


inner and outer loops. In particular we have shown that the combinationFGMRES(5)/GMRES(20) can effectively enhance the robustness of thepreconditioner, reducing significantly the computational cost and the storagerequirement for the solution of large problems. Most of the experimentsshown in this chapter require a huge amount of computation and storage,and they often reach the limits of our target machine in terms of Mbytes.For the solution of systems with one million unknowns direct methods wouldrequire 8 Tbytes of storage and 37 years of computation on one processor ofthe target computer (assuming the computation runs at peak performance).Some questions are still open. One issue concerns the optimal tuning ofthe inner accuracy of the FMM. In the numerical experiments we selected a“medium” accuracy for the inner iteration. We mention now as before thatusing a “lower” accurate FFM in the inner GMRES does not enable us toget convergence of the outer FGMRES. A multilevel scheme can be designedas a natural extension of the simple two-level scheme considered in thischapter, with several embedded FGMRES going down to the lowest accuracyin the innermost GMRES. An interesting further experiment might be to usevariants of these schemes, based on the FQMR method [136] as the outersolver and SQMR as the inner solver. The SQMR scheme is remarkablyrobust on these applications when used in combination with a symmetricFrobenius-norm minimization preconditioner such as those introduced inChapter 4.

Chapter 6

Spectral two-levelpreconditioner

In the previous chapter, we analysed the numerical behaviour of theFrobenius-norm minimization method for the solution of large problems.The numerical results indicate that the preconditioner is less effective whenthe problem size increases because of the inherently local nature of theapproximate inverse and the global behaviour of the equations. In thischapter, we introduce an algebraic multilevel strategy based on low-rankupdates for the preconditioner computed by using spectral information ofthe preconditioned matrix.

The chapter is organized in the following way. In Section 6.1, we motivatethe idea of the construction of multilevel preconditioners via low-rankupdates, and we provide a few references to similar work. In Section 6.2, wedescribe an additive formulation of the preconditioner for both unsymmetricand symmetric systems. We show the results of numerical experimentsto illustrate the computational and numerical efficiency of the algorithmon a set of model problems arising from electromagnetic calculations. InSection 6.3, we describe a multiplicative formulation of the preconditionerand give some comparative results. We conclude the chapter with some finalremarks and perspectives.


The construction of the Frobenius-norm minimization preconditioneris inherently local. Each degree of freedom in the approximate inverseis coupled to only a very few neighbours and this compact support doesnot allow an exchange of global information. When the exact inverseis globally coupled, the lack of global information may have a severe

111

112 6. Spectral two-level preconditioner

impact on the quality of the preconditioner. The discrete Green’s functionin electromagnetic applications exhibits a rapid decay, nevertheless theexact inverse is dense and thus has global support. The locality ofthe preconditioner can be reduced by increasing the number of nonzeroscomputed, but the construction cost grows almost cubicly with respect todensity. Enlarging the sparsity pattern imposed on A can be a cheaperremedy because the computational cost for the least-squares solution growsonly linearly with the number of rows. However, in a multipole contextwhere only the entries of the coefficient matrix associated with the near-field interactions are available, the computation of additional entries fromA requires the approximation of surface integrals.

In this chapter, we propose a refinement technique which enhances therobustness of the approximate inverse on large problems. The methodis based on the introduction of low-rank updates computed by exploitingspectral information of the preconditioned matrix. The purpose hereis to remove the effect of the smallest eigenvalues in magnitude in thepreconditioned matrix, which potentially can slow down the convergenceof Krylov solvers. We discussed in Chapter 2 that a clustered spectrumis highly desirable property for the rapid convergence of Krylov methods.In exact arithmetic the number of distinct eigenvalues would determinethe maximum dimension of the Krylov subspace. If the diameters ofthe clusters are small enough, the eigenvalues within each cluster behavenumerically like a single eigenvalue, and we would expect less iterationsof a Krylov method to produce reasonably accurate approximations. TheFrobenius-norm minimization preconditioner succeeds in clustering most ofthe eigenvalues far from the origin, nevertheless eigenvalues nearest zerocan potentially slow down convergence. Theoretical studies have relatedsuperlinear convergence of GMRES to the convergence of Ritz values [143].Basically, convergence occurs as if, at each iteration of GMRES, the nextsmallest eigenvalue in magnitude is removed from the system. As therestarting procedure destroys information about the Ritz values at eachrestart, the superlinear convergence may be lost. Thus removing the effectof small eigenvalues in the preconditioned matrix can have a beneficial effecton the convergence.

There are essentially two different approaches for exploiting informationrelated to the smallest eigenvalues during the iteration. The firstidea is to compute a few, k say, approximate eigenvectors of MAcorresponding to the k smallest eigenvalues in magnitude, and enlargethe Krylov subspace with those directions. At each restart, letu1, u2, ..., uk be approximate eigenvectors corresponding to the approximateeigenvalues of MA closest to the origin. The updated solution ofthe linear system in the next cycle of GMRES is extracted fromSpan{r0, Ar0, A

2r0, A3r0, ..., A

m−k−1r0, u1, u2, ..., uk}. This approach isreferred to as the augmented subspace approach (see [112, 113, 120]).


The approximate eigenvectors can be chosen to be Ritz vectors from theArnoldi method. The standard implementation of the restarted GMRES(m)algorithm is based on the Arnoldi process, and this allows us to recoverspectral information of MA during the iterations. Deflation techniques havebeen proposed in [94, 43].

The second idea exploits spectral information gathered during theArnoldi process to determine an approximation of an invariant subspace of Aassociated with the eigenvalues nearest the origin, and uses this informationto construct a preconditioner or to update the preconditioner. The idea ofusing exact invariant subspaces to improve the eigenvalue distribution wasproposed in [119]. Information from the invariant subspace associated withthe smallest eigenvalues and its orthogonal complement are used to constructa preconditioner in the approach proposed in [7]. This information can beobtained from the Arnoldi decomposition of a matrix A of size n that hasthe form

AVm = VmHm + fmeTm

where Vm ∈ Rn×m, fm ∈ Rn, em is the m-th unit vector of Rn, V TmVm =

Im, V Tmfm = 0, and Hm ∈ Rm×m is an upper Hessenberg matrix. If the

Arnoldi process is started from Vme1 = r0/‖r0‖, the columns of Vm spanthe Krylov subspace Km(A, r0). Let the matrix Vk ∈ Rk×n consist of thefirst k columns v1, v2, ..., vk of Vm, and let the columns of the orthogonalmatrix Wn−k span the orthogonal complement of Span{v1, v2, ..., vk}. AsW T

n−kWn−k = In−k, the columns of the matrix [Vk Wn−k] form anorthogonal basis of Rn. In [7] the inverse of the matrix

M = VkHkVTk + Wn−kW

Tn−k

is used as a left preconditioner. It can be expressed as:

M−1 = VkH−1k V T

k + Wn−kWTn−k.

At each restart, the preconditioner is updated by extracting neweigenvalues which are the smallest in magnitude. The algorithm proposeduses the recursion formulae of the implicitly restarted Arnoldi (IRA) methoddescribed in [132], and the determination of the preconditioner does notrequire the evaluation of any matrix-vector products with the matrix A inaddition to those needed for the Arnoldi process.

Another adaptive procedure to determine a preconditioner duringGMRES iterations was introduced in [59]. It is based on the sameidea of estimating the invariant subspace corresponding to the smallesteigenvalues. The preconditioner is based on a deflation technique such thatthe linear system is solved exactly in an invariant subspace of dimension rcorresponding to the smallest r eigenvalues of A.


Finally, a preconditioner for GMRES based on a sequence of rank-oneupdates that involve the left and right smallest eigenvectors is proposedin [92]. The method is based on the idea of translating isolated eigenvaluesconsecutively group by group into a vicinity of the point (1.0,0.0) usinglow-rank projections of the coefficient matrix of the form

A = A · (In + u1vH1 ) · ... · (In + ulv

Hl ).

The vectors uj and vj , j ∈ [1, l] are determined to ensure the numericalstability of consecutive translations of groups of isolated eigenvalues of A.After each restart of GMRES(m), approximations to the isolated eigenvaluesto be translated are computed by the Arnoldi process. The isolatedeigenvalues are translated towards the point (1.0,0.0) of the spectrum, andthe next cycle of GMRES(m) is applied to the transformed matrix. Theeffectiveness of this method relies on the assumption that most of theeigenvalues of A are clustered close to (1.0,0.0) in the complex plane.

Most of these schemes are combined with the GMRES procedure as theyderive information directly from its internal Arnoldi process. In our work,we consider an explicit eigencomputation which makes the preconditionerindependent of the Krylov solver used for the actual solution of the linearsystem.

6.2 Two-level preconditioner via low-rank spectralupdates

The Frobenius-norm minimization preconditioner succeeds in clusteringmost of the eigenvalues far from the origin. This can be observed inFigure 6.2.1 where we see a big cluster near (1.0,0.0) in the spectrumof the preconditioned matrix for Example 2. This kind of distribution ishighly desirable to get fast convergence of Krylov solvers. Nevertheless theeigenvalues nearest to zero potentially can slow down convergence. Whenwe use M2g−g it is difficult to remove all the smallest eigenvalues close tothe origin even if we increase the number of nonzeros.

In the next sections, we propose a refinement technique for theapproximate inverse based on the introduction of low-rank correctionscomputed by using spectral information associated with the smallesteigenvalues in MA. Roughly speaking, the proposed technique consistsin solving the preconditioned system exactly on a coarse space and usingthis information to update the preconditioned residual. We first presentour technique for unsymmetric linear systems and then derive a variant forsymmetric and SPD matrices.

6.2. Two-level preconditioner via low-rank spectral updates 115

−0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

Real axis

Imag

inar

y ax

is

Figure 6.2.1: Eigenvalue distribution for the coefficient matrixpreconditioned by the Frobenius-norm minimization method on Example 2.

6.2.1 Additive formulation

We consider the solution of the linear system

Ax = b, (6.2.1)

where A is a n × n complex unsymmetric nonsingular matrix, x and b arevectors of size n. The linear system is solved using a preconditioned Krylovsolver and we denote by M1 the left preconditioner, meaning that we solve

M1Ax = M1b. (6.2.2)

We assume that the preconditioned matrix M1A is diagonalizable, thatis:

M1A = V ΛV −1, (6.2.3)

with Λ = diag(λi), where |λ1| ≤ . . . ≤ |λn| are the eigenvalues and V = (vi)the associated right eigenvectors. We denote by U = (ui) the associated lefteigenvectors; we then have UHV = diag(uH

i vi), with uHi vi 6= 0, ∀i [147]. Let

Vε be the set of right eigenvectors associated with the set of eigenvalues λi

with |λi| ≤ ε. Similarly, we define by Uε the corresponding subset of lefteigenvectors.

Theorem 1 LetAc = UH

ε M1AVε,

Mc = VεA−1c UH

ε M1


andM = M1 + Mc.

Then MA is diagonalisable and we have MA = V diag(ηi)V −1 with

{ηi = λi if |λi| > ε,ηi = 1 + λi if |λi| ≤ ε.

ProofWe first remark that Ac = diag(λiu

Hi vi) with |λi| ≤ ε and then Ac is

nonsingular.Let V = (Vε, Vε), where Vε is the set of (n− k) right eigenvectors associatedwith eigenvalues |λi| > ε.Let Dε = diag(λi) with |λi| ≤ ε and Dε = diag(λj) with |λj | > ε.The following relations hold

MAVε = M1AVε + VεA−1c UH

ε M1AVε

= VεDε + VεIk

= Vε(Dε + Ik)where Ik denotes the (k × k) identity matrix, andMAVε = M1AVε + VεA

−1c UH

ε M1AVε

= VεDε + VεA−1c UH

ε VεDε

= VεDε as UHε Vε = 0.

We then have

MAV = V

(Dε + Ik 0

0 Dε

).

¥

Ac represents the projection of the matrix M1A on the coarse spacedefined by the approximate eigenvectors associated with its smallesteigenvalues.

Theorem 2 Let W be such that

Ac = WHAVε has full rank,

Mc = VεA−1c WH

andM = M1 + Mc.

Then MA is similar to a matrix whose eigenvalues are{

ηi = λi if |λi| > ε,ηi = 1 + λi if |λi| ≤ ε.


ProofWith the same notation as for Theorem 1 we have:MAVε = M1AVε + VεA

−1c WHAVε

= VεDε + VεIk

= Vε(Dε + Ik)MAVε = M1AVε + VεA

−1c WHAVε

= VεDε + VεC with C = A−1c WHAVε

=(

VεVε

)(CDε

)

We then have

MAV = V

(Dε + Ik C

0 Dε

).

¥For right preconditioning, that is AM1y = b, similar results hold.

Lemma 1 LetAc = UH

ε AM1Vε,

Mc = M1VεA−1c UH

ε

andM = M1 + Mc.

Then AM is diagonalisable and we have AM = V diag(ηi)V −1 with{


Lemma 2 Let W be such that

Ac = WHAM1Vε has full rank,

Mc = M1VεA−1c WH

andM = M1 + Mc.

Then AM is similar to a matrix whose eigenvalues are{


We should point out that if the symmetry of the preconditioner has to bepreserved an obvious choice exists. For left preconditioning, we can set W =Vε, that nevertheless does not imply that Ac has full rank. For SPD matricesthis choice leads to a SPD preconditioner. Indeed the preconditioner M isthe sum of a SPD matrix M1 and the low rank update that is symmetricsemi-definite; it can be noticed that in this case the preconditioner has asimilar form to those proposed in [24] for two-level preconditioners in non-overlapping domain decomposition.



In this section, we show some numerical results that illustrate theeffectiveness of the spectral two-level preconditioner for the solution of densecomplex symmetric non-Hermitian systems arising from the discretizationof surface integral equations in electromagnetism. In our experiments, theeigenpairs are computed in a preprocessing step, before performing theiterative solution. This makes the preconditioner independent of the Krylovsolver used for the actual solution of the linear system at the cost of thisextra computation. We use the IRA method implemented in the ARPACKpackage to compute approximations to the smallest eigenvalues and theircorresponding approximate eigenvectors. The methods implemented in theARPACK software are derived from a class of algorithms called Krylovsubspace projection methods. These methods use information from thesequence of vectors generated by the power method to compute eigenvectorscorresponding to eigenvalues other than the one with largest magnitude.In our experiments, we consider coarse spaces of dimension up to 20,and different values of restart for GMRES, from 10 to 110. For eachtest problem, we perform experiments with two levels of accuracy in theGMRES solution to gain more insight into the robustness of our method.We provide extensive results in Appendix A; in this chapter we showthe qualitative numerical behaviour of our method on our set of testexamples that can be representative of the general trend in electromagneticapplications. First we consider the unsymmetric formulation describedin Theorem 1. In Figures 6.2.2-6.2.6 we show the number of iterationsrequired by GMRES(10) to reduce the normwise backward error to 10−8

and 10−5 for increasing size of the coarse space. The numerical results showthat the introduction of the low-rank updates can remarkably enhance therobustness of the approximate inverse. By selecting up to 10 eigenpairsthe number of iterations decreases by at least a factor of 2 on most of theexperiments reported. The gain is more relevant when high accuracy isrequired for the approximate solution. On Example 2, the preconditioningupdates enable fast convergence of GMRES with a low restart within atolerance of 10−8 whereas no convergence was obtained in 1500 iterationswithout updates. However, a substantial improvement on the convergenceis observed also when low accuracy is required. In the most effective case,by selecting 10 corrections, the number of GMRES iterations needed toachieve convergence of 10−5 using low restarts reduces by greater thana factor of 4 on Example 5. If more eigenpairs are selected, generallyno substantial improvement is observed. In fact, the gain in terms ofiterations is strongly related to the magnitude of the shifted eigenvalues. Aspeedup in convergence is obtained when a full cluster of small eigenvaluesis completely removed. This is illustrated in Tables 6.2.1 and 6.2.2 where weshow the effect on the convergence of GMRES(10) of deflating eigenvalues


of increasing magnitude on Examples 2 and 5, that are representative ofthe general trend. On Example 2, the presence of a very small eigenvalueslows down the convergence significantly. Once this eigenvalue is shifted,the number of iterations rapidly decreases. On Example 5, there is a clusterof seven eigenvalues of magnitude around 10−3. When the eigenvalueswithin the cluster are shifted, a quick speed-up of convergence is observed;the shifting of the remaining eigenvalues does not have any impact onthe convergence. In Figure 6.2.7-6.2.9, we show the number of iterationsrequired by restarted GMRES to reduce the normwise backward error to10−8 for different values of restart and increasing size of the coarse space.The remarkable enhancement of the robustness of the preconditioner enablesthe use of very small restarts for GMRES.

0 2 4 6 8 10 12 14 16 18 2050

100

150

200

250

300

350

400

Size of the coarse space

Num

ber

of it

erat

ions

of G

MR

ES

(10)

Example 1 − Size = 1080 − IRAM tolerance = 0.1

GMRES Toler = 1.0e−8 " = 1.0e−5

Figure 6.2.2: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 and 10−5 for increasing sizeof the coarse space on Example 1.


0 2 4 6 8 10 12 14 16 18 2050

100

150

200

250

300

350


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5


Example 2Nr of shiftedEigenvalues

Magnitude ofthe eigenvalue

GMRES(10)Toler = 10−8

0 +15001 7.1116e-04 3102 4.9685e-02 3063 5.2737e-02 3084 6.3989e-02 3045 7.0395e-02 3096 7.7396e-02 3137 7.8442e-02 2468 8.9548e-02 2059 9.1598e-02 205

10 9.9216e-02 198

Table 6.2.1: Effect of shifting the eigenvalues nearest zero on the convergenceof GMRES(10) for Example 2. We show the magnitude of successivelyshifted eigenvalues and the number of iterations required when theseeigenvalues are shifted. A tolerance of 10−8 is required in the iterativesolution.


0 2 4 6 8 10 12 14 16 18 2050

100

150

200

250

300


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5


Example 5Nr of shiftedEigenvalues

Magnitude ofthe eigenvalue

GMRES(10)Toler = 10−8

0 2971 8.7837e-03 2902 8.7968e-03 2903 8.7993e-03 2874 9.8873e-03 2545 9.9015e-03 2326 9.9053e-03 3927 9.9126e-03 528 2.3331e-01 529 2.4811e-01 53

10 2.4813e-01 53

Table 6.2.2: Effect of shifting the eigenvalues nearest zero on the convergenceof GMRES(10) for Example 5. We show the magnitude of successivelyshifted eigenvalues and the number of iterations required when theseeigenvalues are shifted. A tolerance of 10−8 is required in the iterativesolution.


0 2 4 6 8 10 12 14 16 18 2020

40

60

80

100

120

140

160


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5


0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300

350

400


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5



0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300

350

400


Num

ber

of it

erat

ions

of G

MR

ES

(m)


m = 10m = 30m = 50

Figure 6.2.7: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for three choices of restartand increasing size of the coarse space on Example 1.

In the second set of experiments, we consider the formulation of thepreconditioner illustrated in Theorem 2, where we select WH = V H

ε M1 tosave the computation of left eigenvectors. The quality of the preconditioneris very well preserved as we see in Tables 6.2.3-6.2.7. The construction costof the low-rank updates is twice as cheap.

In Table 6.2.8 we show the number of matrix-vector products required bythe ARPACK implementation of the IRA method to compute the smallestapproximate eigenvalues and the associated approximate right eigenvectors.All the numerical experiments are performed in double precision complexarithmetic on a SGI Origin 2000. We remark that the matrix-vectorproducts do not include those required for the iterative solution. Althoughthe computation can be expensive, the cost can be amortized if thepreconditioner is reused to solve linear systems with the same coefficientmatrix and several right-hand sides. In Table 6.2.9 we show the number ofamortization vectors relative to GMRES(10) and a tolerance of 10−5, thatis the number of right-hand sides that have to be considered to amortize theextra cost for the eigencomputation. The localization of a few eigenvalueswithin a cluster may be more expensive than the computation of a fullgroup of small eigenvalues. The optimal trade-off seems to be around 10. Inthat case the number of amortization vectors is reasonably small especiallycompared to real electromagnetic calculations where linear systems withthe same coefficient matrix and up to thousands of right-hand sides areoften solved.


0 2 4 6 8 10 12 14 16 18 2050

100

150

200

250

300

350

400

450

500


Num

ber

of it

erat

ions

of G

MR

ES

(m)


m = 10m = 30m = 50


0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300


Num

ber

of it

erat

ions

of G

MR

ES

(m)


m = 10m = 30m = 50



Example 1

Size of thecoarse space

Choice for the operator WH

WH = UHε M1 WH = V H

ε M1 W = Vε

1 314 315 3162 314 314 3123 313 314 3154 310 313 3085 313 306 3156 315 303 3117 315 298 2908 315 294 2929 315 303 302

10 248 244 24411 206 206 20412 197 190 21513 194 177 20814 192 177 18415 191 180 18616 189 184 18917 189 180 19518 175 180 20519 166 174 18220 153 174 173

Table 6.2.3: Number of iterations required by GMRES(10) preconditionedby a Frobenius-norm minimization method updated with spectralcorrections to reduce the normwise backward error by 10−8 for increasingsize of the coarse space on Example 1. Different choices are considered forthe operator WH .


Example 2




ε M1 W = Vε

1 310 285 3042 306 286 3053 308 286 3104 304 279 3035 309 286 3106 313 286 3077 246 229 2398 205 188 2019 205 187 202

10 198 185 19411 198 184 19412 198 196 19313 198 187 19314 185 190 19415 175 189 19316 186 183 18517 159 178 18318 192 179 18719 167 178 18520 187 168 169



Example 3




ε M1 W = Vε

1 267 254 2602 271 286 2673 263 284 2724 260 259 2565 255 269 2626 209 221 1997 209 222 2028 209 225 2089 137 133 135

10 127 126 12611 126 124 12512 115 117 11513 119 117 11814 119 119 12015 114 119 11016 104 105 10317 105 106 10518 103 105 10219 97 99 9420 96 96 90



Example 4




ε M1 W = Vε

1 145 145 1452 134 134 1353 133 130 1314 127 125 1265 126 123 1246 123 120 1227 101 101 1018 101 101 999 100 98 94

10 72 94 9311 95 93 8612 86 86 8613 86 86 8514 84 85 8315 82 82 8216 81 81 8217 81 82 8218 80 81 8219 82 81 8120 76 77 77



Example 5




ε M1 W = Vε

1 290 312 2902 290 311 2903 287 354 2874 254 345 2525 232 270 2146 392 559 4307 52 53 518 52 55 529 53 55 53

10 53 54 5211 53 53 5212 52 52 4913 58 53 4914 50 52 5015 51 52 5016 51 52 5017 51 52 5018 60 52 5019 59 53 5220 60 53 52




Number of Matrix-Vector products

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 51 90 135 120 75 602 388 440 336 168 583 243 524 290 214 1074 281 469 250 178 1035 354 423 192 149 1636 293 357 183 180 1567 247 340 175 134 1288 198 333 165 236 1059 179 345 154 261 125

10 138 358 169 207 12811 186 527 157 191 16012 213 579 219 197 13113 189 574 224 248 16214 235 1010 212 309 12615 276 1762 223 355 16416 266 1053 202 412 23717 514 751 226 408 22718 336 3050 264 390 22319 336 2359 264 426 75620 650 1066 300 345 220

Table 6.2.8: Number of matrix-vector products required by the IRAMalgorithm to compute approximate eigenvalues nearest zero and thecorresponding right eigenvectors.

A natural question concerns the sensitivity of the preconditioner to theaccuracy of the approximate eigenvectors. In the numerical experimentswe require an accuracy of 0.1 on the computation of the eigenpairs. Thestopping criterion adopted in the ARPACK implementation of the IRAalgorithm assures small backward error on the Ritz pairs. The backwarderror is defined as the smallest perturbation ∆A, in norm, such that theRitz pair is an eigenpair for the perturbed system A + ∆A. At the endof the computation, we checked that the required accuracy was attained.If λ is an approximate eigenvalue and x is the corresponding approximateeigenvector, then the normwise backward error associated with the eigenpair(λ,x) is ‖r‖

α‖x‖ , where α > 0 and r = Ax − λx. In Table 6.2.10, the spectralinformation is computed at an accuracy of the order of the machineprecision, that is 10−16. No remarkable differences with the previous resultscan be observed in the number of iterations except for Example 2.



Number of Amortization Vectors

Ex. 1 Ex. 2 Ex. 3 Ex. 4 Ex. 51 9 135 - - 602 36 74 - 56 583 23 105 290 72 184 26 94 84 30 55 33 85 48 25 56 25 179 9 26 1567 21 14 8 7 28 17 8 8 11 29 15 8 3 12 2

10 4 8 3 5 211 3 12 3 8 212 4 13 4 8 213 3 13 4 10 214 4 16 4 12 215 4 26 4 13 216 4 19 3 15 317 8 10 4 15 318 5 53 4 49 319 4 33 4 16 1020 7 21 4 12 3

Table 6.2.9: Number of amortization vectors required by the IRAMalgorithm to compute approximate eigenvalues nearest zero and thecorresponding right eigenvectors. The computation of the amortizationvectors is relative to GMRES(10) and a tolerance of 10−5.

In Figures 6.2.11-6.2.14, we investigate the numerical behaviour of ourmethod in the presence of a larger cluster of small eigenvalues in M1A, thatis when M1 is a poor preconditioner. This generally happens when thenonzero structure of the approximate inverse is very sparse, or when lessinformation from A is used to construct M1. As we mentioned in Chapter 3and is shown in Figures 6.2.10 for Example 2, a side-effect of reducing thenumber of nonzeros in the sparse approximation of A is that a larger numberof eigenvalues cluster around the origin of the spectrum of the preconditionedmatrix. In the experiments reported in Figures 6.2.11-6.2.14 the Frobenius-norm preconditioner is constructed using the same nonzero structure tosparsify A and to compute M1. As the results show, when the preconditioneris not very effective the spectral corrections, although beneficial, do not


Example 1


Numberof iterations of GMRES(10)

Ex 1 Ex 2 Ex 3 Ex 4 Ex 51 315 215 254 145 3122 314 202 286 134 3113 314 193 284 130 3544 313 192 259 125 3465 306 190 269 123 2706 303 189 221 120 5527 298 161 222 101 538 294 147 225 101 559 303 146 133 99 54

10 244 144 126 94 5411 206 143 124 93 5012 190 143 117 86 5013 177 140 117 86 4814 177 139 119 85 4815 182 139 117 82 4816 184 139 106 81 4817 171 139 106 81 4818 176 135 102 81 4819 177 135 94 80 4720 178 131 99 80 50

Table 6.2.10: Number of iterations required by GMRES(10) preconditionedby a Frobenius-norm minimization method updated with spectralcorrections to reduce the normwise backward error by 10−8 for increasingsize of the coarse space. The formulation of Theorem 2 with the choiceWH = V H

ε M1 is used for the low-rank updates. The computation of Ritzpairs is carried out at machine precision.

enhance its robustness significantly. Coarse spaces of larger size may benecessary to shift the clustered eigenvalues nearest zero and speed up theconvergence. The localization of the eigenvalues of smallest magnitude bythe IRA method is much more expensive in this situation, as illustrated inthe numerical experiments reported in Appendix A.


−0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

Real axis

Imag

inar

y ax

is

Figure 6.2.10: Eigenvalue distribution for the coefficient matrixpreconditioned by a Frobenius-norm minimization method on Example 2.The same sparsity pattern is used for A and for the preconditioner.

0 2 4 6 8 10 12 14 16 18 20200

300

400

500

600

700

800

900


Num

ber

of it

erat

ions

of G

MR

ES

(10)


Toler = 1.0e−8 " = 1.0e−5

Figure 6.2.11: Convergence of GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce thenormwise backward error by 10−8 and 10−5 for increasing size of the coarsespace on Example 1. The formulation of Theorem 2 with the choice WH =V H

ε M1 is used for the low-rank updates. The same nonzero structure is usedfor A and M1.


0 2 4 6 8 10 12 14 16 18 20200

250

300

350

400

450

500

550


Num

ber

of it

erat

ions

of G

MR

ES

(10)


Toler = 1.0e−8 " = 1.0e−5



0 2 4 6 8 10 12 14 16 18 2050

100

150

200

250

300

350


Num

ber

of it

erat

ions

of G

MR

ES

(10)


Toler = 1.0e−8 " = 1.0e−5




0 2 4 6 8 10 12 14 16 18 20100

120

140

160

180

200

220

240

260

280


Num

ber

of it

erat

ions

of G

MR

ES

(10)


Toler = 1.0e−8 " = 1.0e−5




6.2.3 Symmetric formulation

One problem with the previous formulations is that the updatedpreconditioner M is no longer symmetric even if M1 is symmetric. Asymmetric formulation can be obtained if we choose W = Vε in Theorem 2.Nevertheless we point out that, as in the case WH = V H

ε M1, the projectedmatrix Ac is not guaranteed to have full rank. For SPD matrices this choicenaturally leads to a SPD preconditioner.

In Tables 6.2.3-6.2.7, we show experiments with this choice for theoperator W . The method is still effective as no remarkable deterioration canbe observed in the quality of the preconditioner computed. In Figures 6.2.15-6.2.19, we use the symmetric Frobenius-norm minimization method obtainedby averaging the off-diagonal entries, and we solve the linear systemby the SQMR algorithm. The remarkable robustness of this solver onelectromagnetic applications should be noted, as it clearly outperformsGMRES with large restart.

0 2 4 6 8 10 12 14 16 18 2020

30

40

50

60

70

80

90

100

110


Num

ber

of it

erat

ions

of S

QM

R


SQMR Toler = 1.0e−8 " = 1.0e−5

Figure 6.2.15: Number of iterations required by SQMR preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−5 for increasing size of thecoarse space on Example 1. The symmetric formulation of Theorem 2 withthe choice W = Vε is used for the low-rank updates.


0 2 4 6 8 10 12 14 16 18 2040

60

80

100

120

140

160

180


Num

ber

of it

erat

ions

of S

QM

R


SQMR Toler = 1.0e−8 " = 1.0e−5


0 2 4 6 8 10 12 14 16 18 2030

40

50

60

70

80

90

100


Num

ber

of it

erat

ions

of S

QM

R


SQMR Toler = 1.0e−8 " = 1.0e−5



0 2 4 6 8 10 12 14 16 18 2015

20

25

30

35

40

45

50

55

60

65


Num

ber

of it

erat

ions

of S

QM

R


SQMR Toler = 1.0e−8 " = 1.0e−5


0 2 4 6 8 10 12 14 16 18 2010

15

20

25

30

35

40

45

50

55


Num

ber

of it

erat

ions

of S

QM

R


SQMR Toler = 1.0e−8 " = 1.0e−5


6.3. Multiplicative formulation of low-rank spectral updates 139

6.3 Multiplicative formulation of low-rank spectral updates

The spectral information that we compute can be exploited differentlyif we look at it from a multigrid view point. This leads us to derive amultiplicative version of our two-level preconditioner that can be expressedas a two-grid algorithm. In order to illustrate this link, let us first brieflydescribe the classical geometric two-grid algorithm.

For solving a linear system Ax = b with initial guess x0 where A comesfrom a discretization of an elliptic operator on a mesh, a geometric two-gridalgorithm can be briefly described as follows:

1. Pre-smoothing:a few iterations are performed to damp the high frequencies of theerror. The components that are eliminated belong to the subspacespanned by the eigenvectors associated with the large eigenvalues ofthe pre-smoothing iteration matrix. One iteration of this pre-smoothermight be written

xnew = xold + B(b−Axold) (6.3.4)

and let xk+ 13 denotes the approximate solution after µ1 pre-smoothing

iterations.

2. Coarse grid correction:the components left in the error are smooth and then can berepresented on a coarser mesh. Consequently the residual is projectedonto the coarse mesh and the error equation is solved exactly inthe associated coarse space. The error on the coarse mesh is theninterpolated back into the fine mesh to correct xk+ 1

3 . If we denoteby R the projection operator and by P the prolongation/interpolationoperator, the coarse grid problem is usually defined by the Galerkinformula Ac = RAP . The coarse grid correction can then be written

xk+ 23 = xk+ 1

3 + PA−1c R(b−Axk+ 1

3 ) (6.3.5)

3. Post-smoothing:a few additional smoothing iterations are performed to eliminatethe possible high frequency that might have been introduced bythe interpolation. We can then compute the new iterate xk+1 byreperforming µ2 iterations of the iterative scheme (6.3.4) using theinitial guess xk+ 2

3 .

In a geometric multigrid algorithm, the coarse grid correction effectivelysolves the error equation restricted to the subspace associated with the


smallest eigenvalues. The corresponding eigenvectors are associated withthe low frequency mode and consequently can geometrically be representedon a coarser mesh.

In our multiplicative algorithm we make the “coarse grid” correctionexplicit by actually projecting the error equation directly onto the subspaceassociated with the smallest eigenvalues. More precisely, the smootheris defined by our preconditioner M1, the restriction is R = Vε and theprolongation is UH

ε . The preconditioning operation is performed in threedistinct steps and for the sake of simplicity for this exposition we set µ1 =µ2 = 1. The first step consists of a sweep with the sparse approximate inverseM1, that is z = M1p where p is the vector to precondition. The secondstep is intended to correct some components of the preconditioned vector zalong the directions defined by the approximate eigenvectors correspondingto the approximate eigenvalues smallest in magnitude. Using the notationof Section 6.2.1

z = z + VεA−1c Uε(p−Az).

Finally, the sparse approximate inverse is used to refine the preconditioningoperation in the complement to the subspace determined by approximateeigenvectors corresponding to the smallest eigenvalues

z = z + M1(p−Az).

The nomenclature multiplicative formulation is inherited from theframework of domain decomposition methods as similarities exist withSchwarz methods [26, 130]. In addition, with µ1 = µ2 = 1 the two-stepcorrection can be expressed in the following compact form

z = z + B(p−Az)

where B = (I − (I −M1)(I − VεA−1c Uε))A−1.


In this section, we show the qualitative numerical behaviour of our methodon our set of test examples. In Figures 6.3.20, 6.3.21 and 6.3.22, we show thenumber of iterations required by restarted GMRES to reduce the residualto a prescribed accuracy for increasing size of the coarse space. As before inthe extensive results reported in Appendix A we consider coarse spaces ofincreasing dimension, up to 20, and values of restart for GMRES from 10 to110. The preconditioner is very effective as shown in Figures 6.3.20, 6.3.21and 6.3.22. Compared to the additive formulation a larger reduction in termsof iteration count is observed on all the five examples. However, we pointout that each iteration step requires two additional matrix-vector products

6.3. Multiplicative formulation of low-rank spectral updates 141

which makes this formulation always more expensive than the additive one.Finally, we mention that this formulation naturally leads to a symmetricpreconditioner if M1 is symmetric. Thus the SQMR algorithm can be usedto solve the problem and the results obtained with this solver are shown inAppendix A. It should be noted that the results are surprisingly poor ontwo test problems, Examples 2 and 5.

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300

350

400


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5

Figure 6.3.20: Convergence of GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reducethe normwise backward error by 10−8 and 10−5 for increasing numberofcorrections on Example 1. The symmetric formulation of Theorem 2 withthe choice W = Vε is used for the low-rank updates. The preconditioner isupdated in multiplicative form.


0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5

Figure 6.3.21: Convergence of GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce thenormwise backward error by 10−8 and 10−5 for increasing size of the coarsespace on Example 3. The symmetric formulation of Theorem 2 with thechoice W = Vε is used for the low-rank updates. The preconditioner isupdated in multiplicative form.

0 2 4 6 8 10 12 14 16 18 2020

40

60

80

100

120

140

160


Num

ber

of it

erat

ions

of G

MR

ES

(10)


GMRES Toler = 1.0e−8 " = 1.0e−5

Figure 6.3.22: Convergence of GMRES preconditioned by a Frobenius-norm minimization method updated with spectral corrections to reduce thenormwise backward error by 10−8 and 10−5 for increasing size of the coarsespace on Example 4. The symmetric formulation of Theorem 2 with thechoice W = Vε is used for the low-rank updates. The preconditioner isupdated in multiplicative form.



In this chapter, we have presented a refinement technique for theapproximate inverse based on low-rank corrections computed by usingspectral information from the preconditioned matrix. We have shown theeffectiveness and the robustness of the resulting preconditioner on a set ofsmall but tough problems arising from electromagnetic applications. Themethod is very well suited for its use on electromagnetic problems as thepreconditioner is often used to solve systems with the same coefficientmatrix and multiple right-hand sides. In this way, the extra cost forthe computation of the preconditioner updates can be amortized. It canbe combined with inner-outer schemes via embedded iterations describedin the previous chapter to construct robust and efficient preconditionerson electromagnetic applications, but we do not carry out this study inthis thesis. Moreover, this technique can be used for general problems.Preliminary results on domain decomposition methods [117] and SPDmatrices from the Harwell-Boeing collection [53] are encouraging.

Chapter 7

Conclusions and perspectives

In this thesis, we have presented preconditioning methods forthe numerical solution, using iterative Krylov solvers, of densecomplex symmetric non-Hermitian systems of equations arising from thediscretization of boundary integral equations in electromagnetism. We haveillustrated both the numerical behaviour and the cost of the proposedpreconditioners, identified potential causes of failure and introducedtechniques to enhance their robustness. The major concern of the thesishas been to design robust sparse approximate inverse preconditioners basedon Frobenius-norm minimization techniques. However, in Chapter 2,we considered several standard preconditioners based on the idea ofsparsification, of both implicit and explicit type, and we studied theirnumerical behaviour on electromagnetic applications.

We have shown that incomplete LU factorization methods do not workwell for such systems. The incomplete factorization process is highlyunstable on indefinite matrices like those arising from the discretization ofthe EFIE formulation. Using numerical experiments we have shown that thetriangular factors computed by the factorization can be very ill-conditioned,and the long recurrences associated with the triangular solves are unstable.As an attempt at a possible remedy, we introduced a small complex shift tomove the eigenvalues of the preconditioned system along the imaginary axisand thus try to avoid a possible cluster of eigenvalues close to zero. A smalldiagonal complex shift can help to compute a more stable factorization,and in some cases the performance of the preconditioner can significantlyimprove. Further work is required to make the preconditioner morerobust. Condition estimators can be incorporated into the factorizationprocess to detect instabilities during the computation, and suitable strategiesintroduced to tune the optimal value of the shift and to predict its effect.The construction of the preconditioner is inherently sequential but manyrecent research efforts have been designed to exploit parallelism [102, 111].

145

146 7. Conclusions and perspectives

This gives hope that it might worth examining further this method in aparallel and multipole context.

Factorized approximate inverses, namely AINV and FSAI, exhibitpoor convergence behaviour because the inverse factors can be totallyunstructured; both reordering and shift strategies do not improve theireffectiveness. Any dropping strategy, either static or dynamic, may bevery difficult to tune as it can easily discard relevant information andpotentially lead to a very poor preconditioner. In this case, finding theappropriate threshold to enable a good trade-off between sparsity andnumerical efficiency is challenging and very problem-dependent. Graphpartitioning algorithms can be used to define a sparse structure for theinverse factors. Geometric and spectral partitioning methods would splitthe graph of the sparse approximation A to A into a number, say p,of independent subgraphs of roughly equal size, with relatively smallconnections. By numbering the interior nodes first and the interface nodeslast, the permuted matrix assumes the form

P T AP =

A1 BT1

A2 BT2

. . ....

Ap BTp

B1 B2 · · · Bp ATS

.

where P is a permutation matrix. The diagonal blocks A1, A2, ..., Ap

correspond to connections between nodes in the same subgraph; the off-diagonal blocks Bi correspond to connections between nodes of distinctsubgraphs, and the block AS represents connections between interface nodes.This permutation strategy can also be used to introduce parallelism in theconstruction of some inherently sequential preconditioners [15, 102]. Theinverse of the permuted matrix admits the decomposition

P−1A−1P−T = L−T D−1L−1 =

I1 L−T1

I2 L−T2

. . ....

Ip L−Tp

IS

·

T−11

T−12

. . .T−1

p

T−1S

·

I1

I2

. . .Ip

L−11 L−1

2 · · · L−1p IS

.

It can be seen that fill-in in the inverse factor L can occur only in theblocks L−1

i . The use of these techniques might enable the control andprediction of fill-in in the inverse factors, and enhance significantly the

7. Conclusions and perspectives 147

robustness of factorized approximate inverse methods like AINV or FSAIon electromagnetic applications.

In Chapter 2, we have shown that the location of the large entries in theinverse matrix exhibit some structure and thus a non-factorized approximateinverse can be a good candidate to precondition these systems effectively.In particular, preconditioners based on the Frobenius-norm minimizationare much less prone to instabilities than incomplete factorization methods.To be computationally affordable on dense systems, these preconditionersrequire a suitable strategy to identify the relevant entries to consider inthe original matrix A in order to define small least-squares problems, aswell as an appropriate sparsity structure for the approximate inverse. InChapter 3 we exploited the decay of the discrete Green’s function to computean effective a priori pattern for the approximate inverse. We have shownthat by using additional geometric information from the underlying mesh,it is possible to construct robust sparse preconditioners at an affordablecomputational and memory cost. An important feature of the patternselection strategy based on geometric information is that it does not requireaccess to all the entries of the matrix A, so that it is well suited for animplementation in a fast multipole setting where A is not directly availablebut where only the near field entries are computed. Strategies that useinformation from the connectivity graph of the underlying mesh requireless computational effort to construct the pattern but are generally lesseffective. Also, they may not handle complex geometries very well wheresome parts of the object are not connected. By retaining two differentdensities in the patterns of A and M we can increase the robustness of theresulting preconditioner without penalizing the cost of its construction. Thenumerical experiments show that, using this pattern selection strategy, wecan compute a very sparse but effective preconditioner. With the same lowdensity, none of the standard preconditioners that we discussed earlier cancompete with it.

In Chapter 4, we propose two symmetric preconditioners, that canexploit the symmetry of the original matrix in the associated preconditionerand enable the use of a symmetric Krylov solver that proves to be cheaperthan GMRES iterations. The first strategy simply averages the off-diagonalentries. We have shown that this approach, used in combination withthe SQMR solver, is fairly robust, and is totally insensitive to columnordering; however, the construction of the preconditioner requires the samecomputational cost as in the unsymmetric case. The second strategyonly computes the lower triangular part, including the diagonal, of thepreconditioner. The nonzeros calculated are reflected with respect to thediagonal and are used to update the right-hand sides of the subsequent least-squares problems involved in the construction of the remaining columnsof the preconditioner. If m denotes the number of nonzeros entries inthe approximate inverse, this method only computes (m + n)/2 nonzeros.


Thus the overall computational complexity for the construction can beconsiderably smaller. Through numerical experiments, we have shown thatthis method is not too sensitive to column ordering. Both these methodsappear to be efficient and exhibit a remarkable robustness when used inconjunction with SQMR. They are promising for use in a parallel andmultipole context for the solution of large systems. The first approachis straightforward to parallelize even though it requires more flops for itsconstruction. It would probably be the preconditioner of choice in a paralleldistributed fast multipole environment. The second approach is under halfas expensive and can be computationally attractive especially for largeproblems. Possibilities for parallelizing this approach also exist by usingcolouring techniques to detect independent subsets of columns that can becomputed in parallel. In a multipole context the algorithm must be recastby blocks, and Level 2 BLAS operations have to be used for the least-squaresupdates. Further work is required to implement this procedure.

In Chapter 5, we illustrated the implementation of the Frobenius-normminimization preconditioner within a parallel out-of-core research code thatimplements the Fast Multipole Method (FMM), and we have studied thenumerical and parallel scalability of the implementation for the solution oflarge scattering applications, up to one million unknowns. On problems ofthis size, the construction of the preconditioner can be demanding in termsof time, memory and disk resources. A potential limit of the Frobenius-norm minimization preconditioner, and in general of any sparse approximateinverse method, is that it tends to be less effective on large problemsbecause the number of iterations increases rapidly with the problem size.In Chapter 5, we proposed the use of inner-outer iterative solution schemesimplemented in a multipole context with different levels of accuracy for thematrix-vector products in the inner and outer loops. We have shown that theuse of the multipole matrix can be effective to balance the locality of thepreconditioner. In particular, the combination FGMRES(5)/GMRES(20)can enhance the robustness of the preconditioner, reducing significantly thecomputational cost and the storage requirement for the solution of largeproblems. We have successfully used this approach to solve systems ofsize up to one million unknowns; the approach is very promising for thesolution of challenging real-life industrial applications. Some questions arestill open. One issue concerns the optimal tuning of the inner accuracy. Inthe numerical experiments, we selected a “medium” accuracy for the inneriteration. A multilevel scheme can be designed as a natural extension of thesimple two-level scheme considered in Chapter 5, with several embeddedFGMRES to go down to the lowest accuracy in the innermost GMRES.Variants of these schemes can be based on the flexible variants of the SQMRmethod as outer solvers and SQMR as the inner solver.

In Chapter 6, we investigated a refinement technique for the approximateinverse based on low-rank corrections computed by using spectral


information from the preconditioned matrix. We have illustrated theeffectiveness and the robustness of the proposed preconditioner on a setof small but tough problems arising from electromagnetic applications, andwe have analysed the cost of the algorithm. The conclusion is that themethod is very well suited for the solution of electromagnetic problems; theextra cost for the computation of the preconditioner updates can be quicklyamortized by considering a few right-hand sides. Also, the preconditioner isindependent of the Krylov solver used for the actual solution of the linearsystem. A symmetric formulation has been derived and numerical resultshave shown the remarkable robustness of this formulation when used inconjunction with SQMR. The numerical results are encouraging for theinvestigation of this procedure for the solution of much larger problems. Thecomputation of the preconditioning updates by the IRA method is based onmatrix-vector operations and thus can be easily integrated within the codethat implements the Fast Multipole Method. It could be combined withinner-outer schemes via embedded iterations to construct preconditionerson electromagnetic applications that might be expected to be very robustand effective. Although the electromagnetic context is an ideal settingfor its application, the proposed technique can be effectively used in othercontexts, as it only requires algebraic information from the preconditionedmatrix. Preliminary results on domain decomposition methods and bothSPD and unsymmetric linear systems from the Harwell-Boeing sparse matrixcollection are encouraging.

The idea of updating the preconditioner by using low-rank corrections isa natural one in the context of integral equations, and is inherently relatedto the algebraic structure of the discretizated integral operator. A blockstructure of the coefficient matrix naturally emerges when the oct-tree isconsidered and the unknowns are numbered consecutively by leaf-boxes. Ifthe n unknowns are divided into p groups, the coefficient matrix can bewritten in the form:

A = D + Q

where

D = diag{T11, T22, ..., Tpp}

is a block-diagonal matrix, and Q is a block matrix with zero blocks on thediagonal. Each block Tkk represents the connection between edges withinthe same leaf-box and each off-diagonal block Qkl, l 6= k represents theconnection between edges of group k and group l. The off-diagonal blocksQkl corresponding to far-away groups k and l have low rank rkl and thuscan be expressed as the sum of rkl rank-one updates as follows


Qkl =rkl∑

i=1

uiklv

ikl

T = UklVTkl

whereUkl = [u1

kl, u2kl, ..., u

rklkl ]

Vkl = [v1kl, v

2kl, ..., v

rklkl ]

Matrix-free methods [75] use this idea to approximate nonsingularcoefficient matrices in 3D boundary integral applications from CEMand CFD by purely algebraic techniques. The Matrix DecompositionAlgorithm and its multilevel variant [107, 108] approximates the far-fieldinteractions of electromagnetic scattering problems by standard linearalgebra techniques. In [10] an iterative algorithm is proposed to computelow-rank approximations to blocks of large unstructured matrices; thealgorithm uses only a few entries from the original blocks and theapproximate rank is not needed in advance.

The idea of low-rank approximations can be exploited in the design ofthe preconditioner [21]. Denoting by U and V the matrices

U =

U11 0 · · · 0 U12 0 · · · 0 · · · U1p 0 · · · 00 U21 · · · 0 0 U22 · · · 0 · · · 0 U2p · · · 0...

. . ....

.... . .

... · · · .... . .

...0 0 Up1 0 0 Up2 · · · 0 0 Upp

V =

V11 V21 · · · Vp1 0 0 · · · 0 · · · 0 0 · · · 0

0 0 · · · 0 V12 V22 · · · Vp2 · · · ... 0...

...... · · · 0 0

0 0 · · · 0 0 · · · · · · 0 · V1p V2p · · · Vpp

the matrix Q can be written as the product UV T . In our case the blocks Uii

and Vii are null for i = 1, ..., p. By using the Sherman-Morrison-Woodburyformula [73], the following explicit expression can be derived for the inverseof B :

B−1 = (D + UV T )−1 = D−1 −D−1U(I + G)−1V T D−1

where G = V T D−1U is of order m =∑

k,l rkl. The application of thepreconditioner requires “inverting”, that is exactly factorizing, the diagonalblocks of D and the matrix I + G which has small size. It might profitableto explore this further in future research. Preliminary results show thatthis strategy can be effective provided that the diagonal blocks are exactly


factorized. Some questions are still open. An explicit computation of thesingular value decomposition of off-diagonal blocks is too expensive and isnot feasible in a multipole context where the entries of these blocks arenot available. More sophisticated block partitioning schemes need to beinvestigated to select the rank of the off-diagonal blocks appropriately.

Some methods work well for our applications and we have tuned them forproblems in this area. It would be interesting in future work to see whetherthese methods are applicable in other areas, for example in acoustics.

Appendix A

Numerical results with thetwo-level spectralpreconditioner

153

154 A. Numerical results with the two-level spectral preconditioner

A.1 Effect of the low-rank updates on the GMRESconvergence

Example 1


GMRES(m), Toler. 1e-8

m=10 m=30 m=50 m=80 m=1100 358 213 144 79 791 314 179 138 76 762 314 173 127 73 733 313 172 116 70 704 310 169 113 69 695 313 169 108 67 676 315 162 97 64 647 315 145 91 62 628 315 138 78 59 599 315 134 75 57 57

10 248 103 60 53 5311 206 98 53 52 5212 197 96 52 52 5213 194 91 52 51 5114 192 90 51 51 5115 191 90 51 51 5116 189 80 48 48 4817 189 80 48 48 4818 175 80 48 48 4819 166 60 42 42 4220 153 54 37 37 37

Table A.1.1: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for increasing size of thecoarse space.

A.1. Effect of the low-rank updates on the GMRES convergence 155

Example 1



m=10 m=30 m=50 m=80 m=1100 165 103 75 60 601 154 87 64 56 562 154 87 62 54 543 154 87 62 53 534 154 87 62 53 535 154 87 61 53 536 153 77 50 50 507 153 73 48 48 488 153 72 45 45 459 153 68 44 44 44

10 129 52 40 40 4011 102 50 39 39 3912 97 49 39 39 3913 92 48 38 38 3814 92 48 38 38 3815 92 48 38 38 3816 91 45 35 35 3517 92 45 35 35 3518 97 45 35 35 3519 79 32 31 31 3120 69 26 26 26 26



Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 +1500 496 311 1981 310 235 192 151 1072 306 222 184 144 1043 308 209 177 138 1014 304 208 170 135 975 309 206 164 132 966 313 205 158 123 927 246 174 146 108 888 205 159 138 102 879 205 159 138 101 87

10 198 155 136 99 8611 198 154 136 98 8612 198 154 136 96 8413 198 153 136 89 8314 185 131 109 74 7415 175 138 115 75 7516 186 137 112 74 7417 159 117 98 70 7018 192 135 105 70 7019 167 126 98 68 6820 187 143 112 73 73



Example 2



m=10 m=30 m=50 m=80 m=1100 145 110 95 76 761 144 110 95 76 762 139 104 91 73 733 140 103 90 73 734 140 103 90 73 735 140 102 90 73 736 143 100 88 72 727 119 89 81 68 688 100 83 76 65 659 100 82 76 65 65

10 99 82 76 64 6411 99 82 76 64 6412 99 82 76 64 6413 99 81 75 64 6414 80 60 48 48 4815 77 67 53 52 5216 87 69 60 54 5417 69 56 47 47 4718 87 66 52 51 5119 73 58 46 46 4620 93 74 65 58 58



Example 3



m=10 m=30 m=50 m=80 m=1100 268 174 130 79 791 267 171 123 76 762 271 171 121 72 723 263 170 115 70 704 260 153 100 67 675 255 141 93 64 646 209 111 79 60 607 209 111 78 60 608 209 111 78 58 589 137 86 66 55 55

10 127 82 61 54 5411 126 82 61 54 5412 115 80 56 53 5313 119 81 56 52 5214 119 81 56 52 5215 114 79 52 51 5116 104 74 49 49 4917 105 68 48 48 4818 103 57 43 43 4319 97 59 44 44 4420 96 57 44 44 44



Example 3



m=10 m=30 m=50 m=80 m=1100 129 89 70 57 571 129 89 70 56 562 129 88 69 56 563 128 88 69 56 564 126 86 57 53 535 125 76 49 49 496 107 60 45 45 457 107 60 45 45 458 107 60 45 45 459 73 51 42 42 42

10 65 49 40 40 4011 65 49 40 40 4012 63 48 40 40 4013 64 48 40 40 4014 63 48 40 40 4015 62 46 40 40 4016 59 45 37 37 3717 54 44 36 36 3618 53 32 31 31 3119 53 34 33 33 3320 53 33 32 32 32



Example 4



m=10 m=30 m=50 m=80 m=1100 145 113 90 71 711 145 113 90 68 682 134 105 83 65 653 133 105 83 65 654 127 97 74 61 615 126 95 75 61 616 123 91 63 58 587 101 77 58 56 568 101 77 58 56 569 100 75 58 56 56

10 72 55 43 43 4311 95 74 55 53 5312 86 70 51 51 5113 86 68 49 49 4914 84 66 49 49 4915 82 63 49 49 4916 81 63 49 49 4917 81 65 49 49 4918 80 65 49 49 4919 82 65 49 49 4920 76 59 47 47 47



Example 4



m=10 m=30 m=50 m=80 m=1100 71 57 48 48 481 71 57 48 48 482 68 54 45 45 453 68 53 45 45 454 65 46 43 43 435 65 46 42 42 426 64 45 41 41 417 49 41 38 38 388 49 41 38 38 389 48 41 38 38 38

10 20 18 18 18 1811 46 38 37 37 3712 44 36 34 34 3413 44 36 34 34 3414 43 35 34 34 3415 43 34 34 34 3416 42 34 34 34 3417 43 34 34 34 3418 43 34 34 34 3419 43 34 34 34 3420 40 32 32 32 32



Example 5



m=10 m=30 m=50 m=80 m=1100 297 87 75 66 661 290 78 75 66 662 290 78 75 66 663 287 66 68 58 584 254 66 64 58 585 232 66 62 58 586 392 66 50 50 507 52 43 39 39 398 52 43 39 39 399 53 43 39 39 39

10 53 43 40 40 4011 53 43 40 40 4012 52 44 38 38 3813 58 46 43 43 4314 50 44 38 38 3815 51 44 38 38 3816 51 44 38 38 3817 51 44 38 38 3818 60 45 40 40 4019 59 45 41 41 4120 60 45 42 42 42



Example 5



m=10 m=30 m=50 m=80 m=1100 110 46 42 42 421 109 45 41 41 412 109 45 41 41 413 104 34 33 33 334 88 34 33 33 335 73 34 33 33 336 109 35 33 33 337 23 21 21 21 218 23 21 21 21 219 23 21 21 21 21

10 23 22 22 22 2211 23 22 22 22 2212 24 21 21 21 2113 28 24 24 24 2414 23 21 21 21 2115 23 21 21 21 2116 23 21 21 21 2117 23 21 21 21 2118 30 24 24 24 2419 30 24 24 24 2420 32 24 24 24 24



A.2 Experiments with the operator WH = V Hε M1

Example 1



m=10 m=30 m=50 m=80 m=1100 358 213 144 79 791 315 176 137 76 762 314 171 125 72 723 314 171 115 70 704 313 169 109 68 685 306 171 107 67 676 303 169 96 64 647 298 145 90 61 618 294 138 76 58 589 303 134 71 57 57

10 244 100 59 53 5311 206 94 53 51 5112 190 96 52 51 5113 177 88 51 51 5114 177 88 50 50 5015 180 88 50 50 5016 184 80 47 47 4717 180 80 47 47 4718 180 79 47 47 4719 174 76 46 46 4620 174 61 44 44 44

Table A.2.11: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for increasing size of thecoarse space. The formulation of Theorem 2 with the choice WH = V H

ε M1

is used for the low-rank updates.

A.2. Experiments with the operator WH = V Hε M1 165

Example 1



m=10 m=30 m=50 m=80 m=1100 165 103 75 60 601 151 86 63 56 562 152 86 61 53 533 153 86 61 53 534 150 86 61 53 535 150 84 61 53 536 146 81 50 50 507 146 48 48 48 488 138 70 44 44 449 141 70 43 43 43

10 144 64 40 40 4011 120 51 39 39 3912 98 49 38 38 3813 97 47 37 37 3714 93 46 37 37 3715 93 46 37 37 3716 90 46 34 34 3417 90 44 34 34 3418 90 44 34 34 3419 93 44 34 34 3420 93 33 32 32 32


ε M1



Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 +1500 496 311 1981 285 215 177 141 1032 286 202 169 133 1003 286 193 160 129 974 279 192 152 125 935 286 190 149 124 926 286 189 146 111 897 229 161 137 95 858 188 147 129 90 849 187 146 129 91 84

10 185 144 127 90 8311 184 143 127 89 8312 196 148 131 91 8313 187 147 130 85 8114 190 144 129 80 8015 189 144 126 77 7716 183 137 114 74 7417 178 135 109 73 7318 179 136 108 73 7319 178 135 102 70 7020 168 130 100 69 69


ε M1



Example 2



m=10 m=30 m=50 m=80 m=1100 145 110 95 76 761 119 90 83 69 692 118 88 80 67 673 123 88 80 67 674 122 88 80 67 675 124 88 80 67 676 123 88 80 66 667 106 78 71 62 628 84 71 64 58 589 86 71 65 58 58

10 85 71 65 58 5811 85 71 64 58 5812 94 74 69 61 6113 94 74 68 61 6114 94 74 68 61 6115 92 74 66 58 5816 88 69 60 54 5417 87 69 60 54 5418 86 69 60 54 5419 88 68 58 53 5320 85 67 56 53 53


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 268 174 130 79 791 254 170 121 76 762 286 170 119 72 723 284 169 114 70 704 259 150 99 66 665 269 141 92 63 636 221 110 78 60 607 222 108 77 59 598 225 109 77 58 589 133 86 65 55 55

10 126 82 59 53 5311 124 82 60 53 5312 117 81 56 52 5213 117 81 56 52 5214 119 80 56 52 5215 119 79 53 51 5116 105 74 49 49 4917 106 69 47 47 4718 105 65 46 46 4619 99 58 44 44 4420 96 58 44 44 44


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 129 89 70 57 571 130 88 70 56 562 147 88 68 56 563 145 88 69 56 564 139 83 55 52 525 135 74 49 49 496 116 59 45 45 457 115 60 45 45 458 115 60 45 45 459 70 50 41 41 41

10 66 48 40 40 4011 66 48 40 40 4012 64 47 39 39 3913 64 47 39 39 3914 64 47 39 39 3915 62 46 39 39 3916 56 44 37 37 3717 56 42 36 36 3618 55 41 35 35 3519 56 34 33 33 3320 56 33 32 32 32


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 145 113 90 71 711 145 113 90 68 682 134 106 83 65 653 130 103 85 65 654 125 97 74 61 615 123 93 73 61 616 120 91 66 58 587 101 78 58 56 568 101 78 58 56 569 98 78 58 56 56

10 94 74 56 55 5511 93 74 55 53 5312 86 70 52 51 5113 86 68 50 50 5014 85 67 49 49 4915 82 64 49 49 4916 81 64 49 49 4917 82 66 49 49 4918 81 66 49 49 4919 81 67 50 50 5020 77 62 47 47 47


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 71 57 48 48 481 71 57 48 48 482 68 54 45 45 453 66 50 45 45 454 64 46 43 43 435 63 45 41 41 416 63 45 41 41 417 50 41 38 38 388 50 41 38 38 389 49 41 38 38 38

10 46 39 37 37 3711 46 38 37 37 3712 44 36 35 35 3513 45 36 35 35 3514 44 35 34 34 3415 43 34 34 34 3416 43 34 33 33 3317 43 35 34 34 3418 43 35 34 34 3419 43 35 34 34 3420 41 33 32 32 32


ε M1



Example 5



m=10 m=30 m=50 m=80 m=1100 297 87 75 66 661 312 79 75 66 662 311 79 75 66 663 354 66 68 58 584 345 66 64 58 585 270 66 62 58 586 559 66 50 50 507 53 43 40 40 408 55 43 40 40 409 55 43 40 40 40

10 54 44 41 41 4111 53 44 41 41 4112 52 43 38 38 3813 53 44 38 38 3814 52 44 39 39 3915 52 44 39 39 3916 52 44 39 39 3917 52 44 39 39 3918 52 44 39 39 3919 53 45 39 39 3920 53 45 40 40 40


ε M1



Example 5



m=10 m=30 m=50 m=80 m=1100 110 46 42 42 421 111 45 41 41 412 111 45 41 41 413 112 34 34 34 344 92 35 34 34 345 72 35 34 34 346 121 36 34 34 347 23 21 21 21 218 23 22 22 22 229 24 22 22 22 22

10 24 21 21 21 2111 23 21 21 21 2112 23 21 21 21 2113 23 21 21 21 2114 24 21 21 21 2115 24 21 21 21 2116 24 21 21 21 2117 24 22 22 22 2218 24 22 22 22 2219 24 22 22 22 2220 24 22 22 22 22


ε M1



A.3 Cost of the eigencomputation

Example 1

Nr. of eigenvalues M-V productsCPU-time(in sec)

A-V

1 90 15 92 388 41 363 243 22 234 281 35 265 354 33 336 293 27 257 247 23 218 198 19 179 179 26 15

10 138 14 411 186 26 312 213 21 413 189 22 314 235 23 415 276 36 416 266 31 417 514 50 818 336 34 519 336 34 420 650 75 7

Table A.3.21: Number of matrix-vector products, CPU time andamortization vectors required by the IRAM algorithm to computeapproximate eigenvalues nearest zero and the corresponding eigenvectors.The computation of the amortization vectors is relative to GMRES(10) anda tolerance of 10−5.

A.3. Cost of the eigencomputation 175

Example 2


A-V

1 135 18 1352 440 59 743 524 71 1054 469 63 945 423 58 856 357 49 1797 340 47 148 333 46 89 345 48 8

10 358 50 811 527 74 1212 579 81 1313 574 81 1314 1010 142 1615 1762 303 2616 1053 149 1917 751 107 1018 3050 514 5319 2359 335 3320 1066 188 21



Example 3


A-V

1 120 29 -2 336 79 -3 290 69 2904 250 60 845 192 46 486 183 45 97 175 43 88 165 41 89 154 39 3

10 169 42 311 157 40 312 219 56 413 224 62 414 212 70 415 223 57 416 202 53 317 226 59 418 264 69 419 264 69 420 300 78 4


A.3. Cost of the eigencomputation 177

Example 4


A-V

1 75 25 -2 168 56 563 214 72 724 178 60 305 149 57 256 180 73 267 134 47 78 236 81 119 261 90 12

10 207 72 511 191 67 812 197 70 813 248 88 1014 309 109 1215 355 125 1316 412 156 1517 408 144 1518 390 138 4919 426 191 1620 345 159 12



Example 5


A-V

1 60 30 602 58 29 583 107 53 184 103 51 55 163 81 56 156 78 1567 128 65 28 105 54 29 125 65 2

10 128 67 211 160 83 212 131 94 213 162 85 214 126 68 215 164 97 216 237 124 317 227 119 318 223 118 319 756 454 1020 220 118 3


A.4. Sensitivity of the preconditioner to the accuracy of the eigencomputation179

A.4 Sensitivity of the preconditioner to theaccuracy of the eigencomputation

Example 1



m=10 m=30 m=50 m=80 m=1100 358 213 144 79 791 315 176 137 76 762 314 171 125 72 723 314 171 115 70 704 313 169 109 68 685 306 171 107 67 676 303 169 96 64 647 298 145 90 61 618 294 138 76 58 589 303 134 71 57 57

10 244 100 59 53 5311 206 94 53 51 5112 190 96 52 51 5113 177 88 51 51 5114 177 88 50 50 5015 182 88 50 50 5016 184 80 47 47 4717 171 80 47 47 4718 176 79 47 47 4719 177 77 47 47 4720 178 61 44 44 44

Table A.4.26: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the residual by 10−8 for increasing size of the coarse space. Theformulation of Theorem 2 with the choice WH = V H

ε M1 is used for thelow-rank updates. The computation of Ritz pairs is carried out at machineprecision.


Example 1



m=10 m=30 m=50 m=80 m=1100 165 103 75 60 601 151 86 63 56 562 152 86 61 53 533 153 86 61 53 534 150 86 61 53 535 150 84 61 53 536 146 81 50 50 507 138 70 48 48 488 141 70 44 44 449 144 64 43 43 43

10 120 51 40 40 4011 98 49 39 39 3912 97 47 38 38 3813 93 46 37 37 3714 93 46 37 37 3715 91 46 37 37 3716 92 44 34 34 3417 89 44 34 34 3418 90 44 34 34 3419 92 44 34 34 3420 90 34 32 32 32


ε M1

is used for the low-rank updates. The computation of Ritz pairs is carriedout at machine precision.


Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 496 311 198 1231 215 177 141 103 1032 202 169 133 100 1003 193 160 129 97 974 192 152 125 93 935 190 149 124 92 926 189 146 111 89 897 161 137 95 85 858 147 129 90 84 849 146 129 91 84 84

10 144 127 90 83 8311 143 127 89 83 8312 143 126 88 82 8213 140 122 80 80 8014 139 118 79 79 7915 139 119 79 79 7916 139 118 79 79 7917 139 116 76 76 7618 135 113 75 75 7519 135 114 75 75 7520 131 109 73 73 73


ε M1



Example 2



m=10 m=30 m=50 m=80 m=1100 145 110 95 76 761 119 90 83 69 692 118 88 80 67 673 123 88 80 67 674 122 88 80 67 675 124 88 80 67 676 123 88 80 66 667 106 78 71 62 628 84 71 64 58 589 86 71 65 58 58

10 85 71 65 58 5811 85 71 64 58 5812 84 71 64 58 5813 84 70 62 56 5614 83 70 62 56 5615 83 70 62 56 5616 84 70 62 56 5617 84 70 61 55 5518 79 68 59 55 5519 79 68 60 55 5520 79 67 58 53 53


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 268 174 130 79 791 254 170 121 76 762 286 170 119 72 723 284 169 114 70 704 259 150 99 66 665 269 141 92 63 636 221 110 78 60 607 222 108 77 59 598 225 109 77 58 589 133 86 65 55 55

10 126 82 59 53 5311 124 82 60 53 5312 117 81 56 52 5213 117 81 56 52 5214 119 80 56 52 5215 117 79 53 51 5116 104 73 49 49 4

¯9

17 106 70 47 47 4718 102 64 46 46 4619 94 58 44 44 4420 99 58 44 44 44


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 129 89 70 57 571 130 88 70 56 562 147 88 68 56 563 145 88 69 56 564 139 83 55 525 135 74 49 49 496 116 59 45 45 457 115 60 45 45 458 115 60 45 45 459 70 50 41 41 41

10 66 48 40 40 4011 66 48 40 40 4012 64 47 39 39 3913 63 47 39 39 3914 63 47 39 39 3915 62 46 39 39 3916 55 44 37 37 3717 56 42 36 36 3618 53 39 35 35 3519 55 34 33 33 3320 55 34 32 32 32


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 145 113 90 71 711 145 113 90 68 682 134 106 83 65 653 130 103 85 65 654 125 97 74 61 615 123 93 73 61 616 120 91 66 58 587 101 78 58 56 568 101 78 58 56 569 99 77 58 56 56

10 94 74 56 55 5511 93 74 55 53 5312 86 70 52 51 5113 86 68 50 50 5014 85 67 49 49 4915 82 65 49 49 4916 81 64 49 49 4917 81 66 49 49 4918 81 66 49 49 4919 80 64 49 49 4920 80 65 49 49 49


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 71 57 48 48 481 71 57 48 48 482 68 54 45 45 453 66 50 45 45 454 64 46 43 43 435 63 45 41 41 416 63 45 41 41 417 50 41 38 38 388 50 41 38 38 389 48 40 38 38 38

10 46 39 37 37 3711 46 38 37 37 3712 44 36 35 35 3513 45 36 35 35 3514 44 35 34 34 3415 43 34 34 34 3416 43 34 34 34 3417 43 34 34 34 3418 43 34 34 34 3419 43 34 34 34 3420 43 35 34 34 34


ε M1



Example 5



m=10 m=30 m=50 m=80 m=1100 297 87 75 66 661 312 79 75 66 662 311 79 75 66 663 354 66 68 58 584 346 66 64 58 585 270 66 62 58 586 552 66 50 50 507 53 43 40 40 408 55 43 40 40 409 54 44 40 40 40

10 54 44 40 40 4011 50 43 38 38 3812 50 43 38 38 3813 48 43 38 38 3814 48 43 38 38 3815 48 43 38 38 3816 48 43 38 38 3817 48 43 38 38 3818 48 43 38 38 3819 47 41 36 36 3620 50 40 36 36 36


ε M1



Example 5



m=10 m=30 m=50 m=80 m=1100 110 46 42 42 421 111 45 41 41 412 111 45 41 41 413 112 34 34 34 344 92 35 34 34 345 72 35 34 34 346 121 36 34 34 347 23 21 21 21 218 23 22 22 22 229 23 21 21 21 21

10 23 21 21 21 2111 23 21 21 21 2112 23 21 21 21 2113 23 21 21 21 2114 23 21 21 21 2115 23 21 21 21 2116 23 21 21 21 2117 23 21 21 21 2118 23 21 21 21 2119 23 21 21 21 2120 24 21 21 21 21


ε M1


A.5. Experiments with a poor preconditioner M1 189

A.5 Experiments with a poor preconditioner M1

Example 1



m=10 m=30 m=50 m=80 m=1100 818 418 303 193 1421 804 418 303 193 1392 780 419 301 184 1223 784 419 306 178 1124 779 348 262 154 1055 766 328 247 153 1046 696 317 238 148 1027 722 316 233 149 1028 690 318 236 148 1029 710 314 235 148 102

10 666 298 231 145 10111 710 290 227 144 9912 635 260 196 132 9313 628 258 196 131 9314 589 255 195 130 9315 648 256 195 130 9216 626 255 190 126 9117 658 251 185 113 8718 654 250 185 113 8719 615 251 184 113 8720 658 238 161 93 83


ε M1

is used for the low-rank updates. The same nonzero sructure is imposed onA and M1.


Example 1



m=10 m=30 m=50 m=80 m=1100 342 174 138 81 811 338 174 138 82 822 325 174 138 82 823 331 174 138 82 824 327 146 121 74 745 309 134 114 72 726 291 133 113 71 717 302 133 113 71 718 285 133 113 71 719 301 133 113 71 71

10 304 131 111 71 7111 290 127 107 69 6912 267 108 86 65 6513 269 107 85 64 6414 269 107 85 64 6415 268 107 85 63 6316 269 106 85 63 6317 275 106 85 61 6118 270 106 85 61 6119 265 106 85 61 6120 270 99 74 57 57


ε M1

is used for the low-rank updates. The same nonzero structure is imposed onA and M1.


Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 +1500 +1500 1058 5091 518 371 311 265 2242 513 372 310 265 2233 515 372 309 264 2224 513 371 308 263 2205 516 370 308 263 2206 504 370 307 262 2207 515 369 307 263 2208 506 367 306 262 2209 506 367 306 262 219

10 508 365 306 261 21911 502 366 305 261 21912 502 362 304 260 21913 497 363 304 260 21914 499 363 304 260 21915 499 363 304 260 21916 504 363 304 259 21917 497 362 302 259 21818 490 358 299 256 21819 490 358 299 256 21820 492 358 299 255 218


ε M1



Example 2



m=10 m=30 m=50 m=80 m=1100 247 180 155 137 1151 239 174 149 133 1092 237 174 149 133 1093 238 174 149 133 1094 237 174 148 133 1095 239 174 149 133 1096 237 174 148 132 1097 237 174 148 132 1098 239 173 148 132 1099 237 173 148 132 109

10 239 173 148 132 10811 233 173 148 132 10812 235 173 148 132 10813 233 173 148 132 10814 229 173 148 132 10815 237 173 148 132 10816 237 172 148 132 10817 232 172 147 131 10818 232 171 147 130 10719 233 171 147 130 10720 233 170 147 130 107


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 303 234 194 159 1321 275 210 176 152 1182 278 210 175 150 1113 276 209 173 149 1114 273 208 173 149 1105 277 208 173 149 1106 273 208 173 148 1087 253 191 163 143 1068 254 191 163 143 1069 253 190 163 141 103

10 220 175 148 134 10011 221 175 148 133 9912 221 173 147 133 9913 216 172 145 131 9914 219 172 145 131 9615 219 168 143 128 9316 217 168 143 127 9317 217 166 142 119 9018 213 164 142 119 9019 213 164 142 119 9020 200 150 133 109 88


ε M1



Example 3



m=10 m=30 m=50 m=80 m=1100 152 118 101 94 861 149 114 97 84 822 149 114 96 83 823 147 113 96 81 814 144 112 95 80 805 145 112 95 80 806 144 112 95 80 807 136 105 91 77 778 137 105 91 77 779 136 105 91 77 77

10 119 9¯7 84 73 73

11 118 97 84 73 7312 118 97 84 72 7213 117 96 83 72 7214 117 96 83 72 7215 117 95 82 71 7116 116 93 81 70 7017 115 90 80 69 6918 114 90 80 68 6819 114 90 80 68 6820 108 84 76 66 66


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 256 235 214 194 1711 256 235 214 193 1702 255 232 211 190 1673 255 232 211 190 1664 252 229 207 186 1595 251 229 207 186 1556 250 227 206 185 1557 249 223 199 170 1518 248 222 199 169 1499 248 222 199 169 149

10 248 222 198 169 14811 247 221 197 168 13312 248 220 191 159 12513 240 199 169 148 11914 240 199 169 148 11915 236 195 167 146 11716 237 194 166 146 11717 236 194 166 146 11618 236 194 166 146 11619 229 191 163 139 11420 226 192 163 139 112


ε M1



Example 4



m=10 m=30 m=50 m=80 m=1100 123 110 101 91 871 123 110 101 91 872 122 109 100 91 873 122 109 100 91 874 122 108 99 90 875 121 108 99 90 866 120 107 98 89 867 119 103 95 83 838 119 103 95 83 839 119 103 95 83 83

10 119 102 94 83 8311 118 101 93 82 8112 118 100 92 81 7613 117 96 88 76 7614 117 96 88 76 7515 116 94 87 75 7516 116 92 85 75 7517 116 92 85 75 7518 116 92 85 75 7519 114 91 84 72 7220 112 92 84 72 72


ε M1



Example 5



m=10 m=30 m=50 m=80 m=1100 +1500 321 175 156 1441 +1500 310 153 155 1442 +1500 310 153 155 1443 1443 174 149 118 1044 1341 175 149 117 1045 1292 174 149 116 1046 1058 193 141 94 897 132 95 86 74 748 132 95 86 74 749 132 95 86 74 74

10 132 95 86 74 7411 132 94 84 74 7412 129 93 84 74 7413 128 92 85 74 7414 125 90 86 74 7415 120 90 85 74 7416 120 90 85 74 7417 120 90 85 74 7418 119 88 84 74 7419 119 86 82 70 7020 120 86 83 70 70


ε M1

is used for the low-rank updates. The same nonzero sructure is imposed onA and M1.


Example 5



m=10 m=30 m=50 m=80 m=1100 527 92 80 72 721 523 90 80 72 722 523 90 80 72 723 509 66 60 57 574 462 65 61 57 575 433 65 61 57 576 270 64 76 57 577 62 43 41 41 418 62 43 41 41 419 62 43 41 41 41

10 62 43 41 41 4111 62 43 41 41 4112 59 44 41 41 4113 58 44 41 41 4114 56 43 40 40 4015 55 40 37 37 3716 55 40 37 37 3717 55 40 37 37 3718 55 40 37 37 3719 55 40 37 37 3720 56 39 37 37 37


ε M1



Example 1Nr. of eigenvalues M-V products CPU-time A-V

1 480 54 1202 5198 526 3063 1580 174 1444 1355 153 915 997 96 316 926 116 197 891 82 238 965 89 179 1367 126 34

10 1317 139 3511 1331 124 2612 1829 248 1213 1738 316 2414 2872 302 4015 3084 355 4216 2574 258 3617 2654 436 4018 2156 253 3019 1689 163 2220 3284 336 46




1 285 47 362 1714 408 183 10242 1690 11384 2083 280 2095 9892 1320 12376 5353 716 547 2599 548 2608 21113 2845 26409 2716 387 272

10 3311 798 41411 3197 442 22912 2534 357 21213 2605 358 18714 2515 345 14015 6477 943 64816 3079 429 30817 3054 480 20418 4658 653 31119 3806 581 27220 9304 1319 665




1 180 43 602 631 148 2113 991 234 1994 955 288 1205 703 170 1016 590 141 747 537 172 348 743 179 509 542 152 16

10 736 177 2311 904 240 2712 745 227 2213 784 242 2314 986 247 2915 1443 388 4216 1333 335 3817 1331 284 3618 1224 311 3319 1495 487 4020 1652 411 38




1 555 189 -2 4307 1439 43073 1402 461 14024 2154 709 21545 1343 617 6726 1097 475 667 1044 457 2618 1541 757 3869 1413 614 354

10 1441 496 36111 3440 1166 68812 3688 1241 73813 4473 1548 74614 2514 948 41915 1695 573 24316 5491 1864 78517 2787 993 39918 3573 1217 51119 7160 2462 79620 8188 2879 745




1 105 51 272 58 29 153 237 115 144 252 145 45 163 81 26 251 128 17 239 134 18 229 114 19 209 118 1

10 213 154 111 215 109 112 608 565 213 665 348 214 655 383 215 817 420 216 850 439 217 1060 620 318 1247 622 319 973 842 320 4293 2206 10



A.6 Numerical results for the symmetricformulation

Example 1



m=10 m=30 m=50 m=80 m=1100 358 213 144 79 791 316 179 137 76 762 312 172 126 73 733 315 170 116 70 704 308 166 112 68 685 315 170 108 67 676 311 170 96 64 647 290 144 90 62 628 292 138 77 58 589 302 134 72 57 57

10 244 99 60 53 5311 204 96 54 51 5112 215 96 54 51 5113 208 89 52 51 5114 184 88 51 51 5115 186 88 51 51 5116 189 80 47 47 4717 195 80 47 47 4718 205 77 47 47 4719 182 77 47 47 4720 173 63 44 44 44

Table A.6.51: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for increasing size of thecoarse space. The symmetric formulation of Theorem 2 with the choiceW = Vε is used for the low-rank updates.

A.6. Numerical results for the symmetric formulation 205

Example 1



m=10 m=30 m=50 m=80 m=1100 165 103 75 60 601 153 87 64 56 562 154 87 62 53 533 154 87 61 53 534 150 87 61 53 535 152 87 61 53 536 151 81 50 50 507 148 72 48 48 488 147 70 44 44 449 146 6

¯6 43 43 43

10 122 51 40 40 4011 110 50 39 39 3912 108 49 38 38 3813 106 47 38 38 3814 106 46 38 38 3815 95 47 38 38 3816 105 44 35 35 3517 109 45 35 35 3518 106 45 35 35 3519 105 44 35 35 3520 105 34 32 32 32



Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 +1500 496 311 1981 304 235 192 151 1072 305 222 184 143 1043 310 209 177 138 1014 303 208 170 136 975 310 206 164 133 956 307 205 160 123 927 239 174 146 107 898 201 159 138 100 879 202 159 136 99 86

10 194 155 135 97 8611 194 155 135 97 8612 193 155 135 95 8413 193 154 134 88 8314 194 150 133 82 8115 193 147 130 78 7816 185 143 119 75 7517 183 141 115 74 7418 187 141 113 74 7419 185 140 107 71 7120 169 135 103 70 70



Example 2



m=10 m=30 m=50 m=80 m=1100 145 110 95 76 761 143 110 95 76 762 140 104 91 73 733 146 103 90 73 734 142 103 90 73 735 147 103 90 73 736 148 101 89 72 727 118 89 82 68 688 118 81 75 64 649 99 80 75 64 64

10 97 80 75 64 6411 97 80 74 63 6312 96 80 75 63 6313 97 80 74 63 6314 96 79 74 63 6315 99 77 70 60 6016 96 74 65 57 5717 92 74 65 57 5718 93 74 65 57 5719 93 73 63 56 5620 86 71 60 54 54



Example 3



m=10 m=30 m=50 m=80 m=1100 268 174 130 79 791 260 171 121 76 762 267 169 120 72 723 272 169 114 70 704 256 155 100 67 675 262 142 93 64 646 199 112 79 60 607 202 112 79 60 608 208 112 79 58 589 135 87 66 55 55

10 126 82 62 54 5411 125 82 61 54 5412 115 81 57 53 5313 118 81 57 53 5314 120 81 58 53 5315 110 76 50 50 5016 103 69 47 47 4717 105 65 46 46 4618 102 66 47 47 4719 94 59 45 45 4520 90 58 44 44 44



Example 3



m=10 m=30 m=50 m=80 m=1100 129 89 70 57 571 129 88 70 56 562 133 88 69 56 563 137 88 69 56 564 127 86 57 53 535 129 76 49 49 496 108 60 45 45 457 108 60 45 45 458 109 60 45 45 459 73 51 42 42 42

10 66 49 40 40 4011 65 49 40 40 4012 62 48 40 40 4013 61 48 40 40 4014 66 48 40 40 4015 56 42 36 36 3616 50 40 34 34 3417 52 38 34 34 3418 56 42 35 35 3519 54 36 34 34 3420 53 35 33 33 33



Example 4



m=10 m=30 m=50 m=80 m=1100 145 113 90 71 711 145 113 90 68 682 135 106 83 65 653 131 103 85 65 654 126 97 74 61 615 124 94 72 61 616 122 91 64 58 587 101 75 58 56 568 99 75 58 56 569 94 74 58 56 56

10 93 74 55 55 5511 86 74 55 53 5312 86 70 51 51 5113 85 68 50 50 5014 83 67 49 49 4915 82 65 49 49 4916 82 65 49 49 4917 82 65 49 49 4918 82 66 49 49 4919 81 66 50 50 5020 77 61 47 47 47



Example 4



m=10 m=30 m=50 m=80 m=1100 71 57 48 48 481 71 57 48 48 482 68 54 45 45 453 66 50 45 45 454 65 46 43 43 435 64 45 41 41 416 64 45 41 41 417 50 41 38 38 388 50 41 38 38 389 48 41 38 38 38

10 46 38 37 37 3711 46 38 37 37 3712 45 36 35 35 3513 45 36 35 35 3514 44 35 34 34 3415 43 34 34 34 3416 43 34 34 34 3417 43 35 34 34 3418 43 35 34 34 3419 43 35 34 34 3420 40 33 32 32 32



Example 5



m=10 m=30 m=50 m=80 m=1100 297 87 75 66 661 290 78 75 66 662 290 78 75 66 663 287 66 68 58 584 252 66 64 58 585 214 66 62 58 586 430 66 50 50 507 51 43 39 39 398 52 43 39 39 399 53 43 39 39 39

10 52 44 40 40 4011 52 44 40 40 4012 49 44 38 38 3813 49 44 38 38 3814 50 44 38 38 3815 50 44 38 38 3816 50 44 38 38 3817 50 44 38 38 3818 50 44 38 38 3819 52 44 39 39 3920 52 44 39 39 39



Example 5



m=10 m=30 m=50 m=80 m=1100 110 46 42 42 421 109 45 41 41 412 109 45 41 41 413 104 34 33 33 334 88 34 33 33 335 72 34 33 33 336 102 36 33 33 337 23 21 21 21 218 23 21 21 21 219 23 21 21 21 21

10 23 21 21 21 2111 23 21 21 21 2112 23 20 20 20 2013 23 21 21 21 2114 23 21 21 21 2115 23 21 21 21 2116 23 21 21 21 2117 23 21 21 21 2118 23 21 21 21 2119 23 21 21 21 2120 24 22 22 22 22




Example

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 50 103 161 92 61 511 97 129 90 61 512 90 119 85 60 513 84 120 78 56 404 78 117 77 54 405 71 108 71 53 406 65 104 67 50 407 60 99 67 49 338 60 97 66 49 339 58 94 58 49 34

10 58 91 56 46 3311 55 82 58 46 3312 52 82 56 42 3413 47 86 52 40 3314 44 77 51 41 3315 44 76 51 41 2616 40 73 48 41 3417 42 70 49 41 3418 41 69 48 41 3419 37 66 47 39 3420 37 68 46 39 34

Table A.6.61: Number of iterations required by SQMR preconditioned by aFrobenius-norm minimization method updated with spectral corrections toreduce the normwise backward error by 10−8 for increasing size of the coarsespace. The symmetric formulation of Theorem 2 with the choice W = Vε isused for the low-rank updates.



Example

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 50 74 70 58 30 241 74 70 58 30 232 65 70 58 30 233 60 70 58 31 234 57 70 55 27 235 49 70 50 27 236 47 70 45 27 157 37 69 45 23 158 40 69 45 23 159 40 56 42 23 14

10 40 58 42 23 1411 34 59 42 21 1412 30 56 40 21 1413 28 59 37 20 1414 25 59 36 20 1415 23 51 36 20 1416 23 47 33 20 1417 23 47 33 19 1418 23 47 33 20 1419 22 42 33 19 1420 22 43 33 19 14

Table A.6.62: Number of iterations required by SQMR preconditioned by aFrobenius-norm minimization method updated with spectral corrections toreduce the normwise backward error by 10−5 for increasing size of the coarsespace. The symmetric formulation of Theorem 2 with the choice W = Vε isused for the low-rank updates.


A.7 Numerical results for the multiplicativeformulation

Example 1



m=10 m=30 m=50 m=80 m=1100 358 213 144 79 791 189 89 49 49 492 188 88 47 47 473 188 85 45 45 454 186 83 44 44 445 186 82 43 43 436 183 70 41 41 417 178 60 39 39 398 178 56 37 37 379 170 53 36 36 36

10 116 44 34 34 3411 105 40 33 33 3312 103 40 33 33 3313 97 40 33 33 3314 96 38 32 32 3215 96 38 32 32 3216 88 30 30 30 3017 88 30 30 30 3018 88 30 30 30 3019 88 29 29 29 2920 79 25 25 25 25

Table A.7.63: Number of iterations required by GMRES preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for increasing size of thecoarse space. The preconditioner is updated in multiplicative form.

A.7. Numerical results for the multiplicative formulation 217

Example 1



m=10 m=30 m=50 m=80 m=1100 165 103 75 60 601 90 44 37 37 372 90 42 35 35 353 90 42 35 35 354 90 42 35 35 355 90 42 34 34 346 89 39 32 32 327 85 31 31 31 318 85 29 29 29 299 83 28 28 28 28

10 58 26 26 26 2611 50 25 25 25 2512 49 25 25 25 2513 47 24 24 24 2414 47 24 24 24 2415 47 24 24 24 2416 45 22 22 22 2217 45 22 22 22 2218 45 22 22 22 2219 43 21 21 21 2120 37 18 18 18 18



Example 2



m=10 m=30 m=50 m=80 m=1100 +1500 +1500 496 311 1981 190 132 98 68 682 187 123 94 66 663 176 117 92 64 644 190 116 89 61 615 189 113 87 60 606 188 110 81 59 597 150 99 73 56 568 130 94 69 55 559 129 93 68 55 55

10 127 90 67 55 5511 127 90 67 55 5512 130 94 67 54 5413 125 90 62 53 5314 113 81 49 49 4915 113 77 47 47 4716 127 85 49 49 4917 279 85 47 47 4718 130 83 48 48 4819 109 69 44 44 4420 128 78 46 46 46



Example 2



m=10 m=30 m=50 m=80 m=1100 145 110 95 76 761 82 59 47 47 472 82 56 45 45 453 80 56 45 45 454 82 56 45 45 455 80 56 45 45 456 83 55 44 44 447 68 51 42 42 428 63 48 40 40 409 60 48 40 40 40

10 60 48 40 40 4011 60 48 40 40 4012 64 49 40 40 4013 60 48 40 40 4014 47 36 32 32 3215 47 35 31 31 3116 64 47 38 38 3817 108 42 33 33 3318 64 46 37 37 3719 47 34 32 32 3220 60 42 35 35 35



Example 3



m=10 m=30 m=50 m=80 m=1100 268 174 130 79 791 163 97 52 51 512 166 96 49 49 493 170 96 47 47 474 163 88 45 45 455 156 71 43 43 436 119 58 41 41 417 119 58 41 41 418 119 57 40 40 409 89 51 37 37 37

10 80 48 36 36 3611 80 47 36 36 3612 77 46 36 36 3613 75 45 35 35 3514 75 45 35 35 3515 70 44 34 34 3416 67 38 33 33 3317 66 35 32 32 3218 63 32 31 31 3119 60 30 30 30 3020 56 29 29 29 29



Example 3



m=10 m=30 m=50 m=80 m=1100 129 89 70 57 571 87 56 39 39 392 88 56 38 38 383 89 57 38 38 384 89 55 36 36 365 82 44 34 34 346 63 33 31 31 317 64 33 31 31 318 64 33 31 31 319 49 29 29 29 29

10 45 28 28 28 2811 45 28 28 28 2812 43 28 28 28 2813 43 28 28 28 2814 43 27 27 27 2715 42 27 27 27 2716 39 26 26 26 2617 39 25 25 25 2518 35 24 24 24 2419 37 23 23 23 2320 34 22 22 22 22



Example 4



m=10 m=30 m=50 m=80 m=1100 145 113 90 71 711 103 66 48 48 482 98 64 45 45 453 98 64 45 45 454 90 59 43 43 435 93 59 43 43 436 87 57 41 41 417 75 51 40 40 408 75 51 40 40 409 76 51 40 40 40

10 64 43 35 35 3511 70 49 37 37 3712 64 40 36 36 3613 62 37 35 35 3514 61 37 35 35 3515 59 36 34 34 3416 59 36 34 34 3417 59 36 34 34 3418 59 36 34 34 3419 59 35 33 33 3320 55 34 33 33 33



Example 4



m=10 m=30 m=50 m=80 m=1100 71 57 48 48 481 52 39 34 34 342 51 35 33 33 333 50 35 33 33 334 45 32 32 32 325 45 33 32 32 326 44 31 31 31 317 36 28 28 28 288 36 28 28 28 289 37 28 28 28 28

10 27 22 22 22 2211 34 27 27 27 2712 31 26 26 26 2613 31 25 25 25 2514 30 25 25 25 2515 30 25 25 25 2516 30 25 25 25 2517 29 25 25 25 2518 29 24 24 24 2419 30 24 24 24 2420 27 23 23 23 23



Example 5



m=10 m=30 m=50 m=80 m=1100 297 87 75 66 661 137 65 48 48 482 136 65 48 48 483 119 47 43 43 434 121 47 43 43 435 123 48 43 43 436 270 47 36 36 367 38 29 29 29 298 38 30 30 30 309 38 30 30 30 30

10 38 30 30 30 3011 39 30 30 30 3012 38 29 29 29 2913 44 30 30 30 3014 37 28 28 28 2815 37 28 28 28 2816 36 28 28 28 2817 37 28 28 28 2818 41 30 30 30 3019 40 30 30 30 3020 43 30 30 30 30



Example 5



m=10 m=30 m=50 m=80 m=1100 110 46 42 42 421 46 40 32 32 322 46 40 32 32 323 46 24 24 24 244 58 24 24 24 245 59 24 24 24 246 85 24 24 24 247 20 16 16 16 168 20 16 16 16 169 20 16 16 16 16

10 20 16 16 16 1611 20 16 16 16 1612 19 16 16 16 1613 22 17 17 17 1714 19 15 15 15 1515 19 15 15 15 1516 19 15 15 15 1517 19 15 15 15 1518 23 18 18 18 1819 24 19 19 19 1920 25 19 19 19 19




Example

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 50 103 161 92 61 511 97 108 75 46 372 107 +500 77 45 403 96 +500 69 49 394 93 +500 77 46 325 91 +500 68 51 646 78 +500 66 42 1867 73 +500 70 41 328 73 +500 65 42 429 68 +500 56 42 +500

10 68 +500 56 38 6611 72 +500 59 47 18312 53 +500 58 40 +50013 56 +500 49 36 +50014 40 +500 48 36 +50015 40 +500 46 36 +50016 42 +500 43 37 11617 35 +500 45 38 +50018 37 +500 45 37 +50019 41 +500 44 35 +50020 39 +500 43 35 +500

Table A.7.73: Number of iterations required by SQMR preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−8 for increasing size of thecoarse space. The symmetric formulation of Theorem 2 with the choiceW = Vε is used for the low-rank updates. The preconditioner is updated inmultiplicative form.



Example

Nr. 1 Nr. 2 Nr. 3 Nr. 4 Nr. 50 74 70 58 30 241 61 46 42 21 102 59 57 45 21 103 50 59 46 21 104 46 63 42 17 105 43 +500 38 17 106 36 +500 34 17 107 37 68 36 13 108 34 154 37 13 109 33 +500 34 13 10

10 32 +500 31 13 1611 30 +500 32 13 1012 27 51 32 13 1013 27 +500 29 13 1614 22 +500 25 13 1415 20 +500 27 13 1316 18 +500 24 13 1417 16 +500 25 13 1318 20 +500 25 13 1219 18 +500 24 13 1820 18 +500 23 13 32

Table A.7.74: Number of iterations required by SQMR preconditioned bya Frobenius-norm minimization method updated with spectral correctionsto reduce the normwise backward error by 10−5 for increasing size of thecoarse space. The symmetric formulation of Theorem 2 with the choiceW = Vε is used for the low-rank updates. The preconditioner is updated inmultiplicative form.

Bibliography

[1] G. Alleon, S. Amram, N. Durante, P. Homsi, D. Pogarieloff, andC. Farhat. Massively parallel processing boosts the solution ofindustrial electromagnetic problems: High performance out-of-coresolution of complex dense systems. In M. Heath, V. Torczon,G. Astfalk, P. E. Bjørstad, A. H. Karp, C. H. Koebel, V. Kumar,R. F. Lucas, L. T. Watson, and Editors D. E. Womble, editors,Proceedings of the Eighth SIAM Conference on Parallel Processing forScientific Computing. SIAM Book, Philadelphia, 1997. Conferenceheld in Minneapolis, Minnesota, USA.

[2] G. Alleon, M. Benzi, and L. Giraud. Sparse approximate inversepreconditioning for dense linear systems arising in computationalelectromagnetics. Numerical Algorithms, 16:1–15, 1997.

[3] F. Alvarado and H. Dag. Sparsified and incomplete sparse factoredinverse preconditioners. In Copper Mountain Conference on IterativeMethods. Preliminary Proceedings., volume I, April 9-14, 1992.

[4] H. Anastassiu and J. L. Volakis. An AIM based analysis ofscattering from cylindrically periodic structures. IEEE Antennas andPropagation Society International Symposium Digest, pages 60–63,1997.

[5] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, andD. Sorensen. LAPACK Users’ Guide. Society for Industrial andApplied Mathematics, Philadelphia, PA, third edition, 1999.

[6] A. W. Appel. An efficient program for many-body simulation. SIAMJ. Scientific and Statistical Computing, 6:85–103, 1985.

[7] J. Baglama, D. Calvetti, G. H. Golub, and L. Reichel. Adaptivelypreconditioned GMRES algorithms. SIAM J. Scientific Computing,20(1):243–269, 1999.

[8] J. Barnes and P. Hut. A hierarchical O(n log n) force calculationalgorithm. Nature, 324:446–449, 1986.

229


[9] A. Bayliss, C. I. Goldstein, and E. Turkel. On accuracy conditionsfor the numerical computation of waves. J. Comp. Phys., 59:396–404,1985.

[10] M. Bebendorf. Approximation of boundary element matrices.Numerische Mathematik, 86(4):565–589, 2000.

[11] A. Bendali. Approximation par elements finis de surface de problemesde diffraction des ondes electro-magnetiques. PhD thesis, UniversiteParis VI , 1984.

[12] M. W. Benson. Iterative solution of large scale linear systems. Master’sthesis, Lakehead University, Thunder Bay, Canada, 1973.

[13] M. W. Benson and P. O. Frederickson. Iterative solution of large sparselinear systems arising in certain multidimensional approximationproblems. Utilitas Mathematica, 22:127–140, 1982.

[14] M. W. Benson, J. Krettmann, and M. Wright. Parallel algorithms forthe solution of certain large sparse linear systems. Int J. of ComputerMathematics, 16, 1984.

[15] M. Benzi, J. Marin, and M. Tuma. A two-level parallel preconditionerbased on sparse approximate inverses. In D.R. Kincaid and A.C.Elster, editors, Iterative Methods in Scientific Computation IV,IMACS Series in Computational and Applied Mathematics, pages 167–178. IMACS, New Brunswick, NJ, 1999.

[16] M. Benzi, C. D. Meyer, and M. Tuma. A sparse approximate inversepreconditioner for the conjugate gradient method. SIAM J. ScientificComputing, 17:1135–1149, 1996.

[17] M. Benzi, D. B. Szyld, and A. van Duin. Orderings for incompletefactorization preconditioning of nonsymmetric problems. SIAM J.Scientific Computing, 20:1652–1670, 1999.

[18] M. Benzi and M. Tuma. A comparison of some preconditioningtechniques for general sparse matrices. In Iterative Methods in LinearAlgebra, II, ed. P. Vassilevski and S. Margenov, IMACS Series inComputational and Applied Mathematics, vol. 3, Piscataway, NJ(1996), pp. 191-203.

[19] M. Benzi and M. Tuma. A sparse approximate inverse preconditionerfor nonsymmetric linear systems. SIAM J. Scientific Computing,19:968–994, 1998.

[20] J.-P. Berenger. A perfectly matched layer for the absorption ofelectromagnetic waves. J. Comp. Phys., 114:185–200, 1994.


[21] R. Bramley and V. Menkov. Parallel preconditioners with low rankoff-diagonal blocks. 1996. Submitted to Parallel Computing.

[22] B. Carpentieri, I. S. Duff, and L. Giraud. Sparse pattern selectionstrategies for robust Frobenius-norm minimization preconditioners inelectromagnetism. Numerical Linear Algebra with Applications, 7(7-8):667–685, 2000.

[23] B. Carpentieri, I. S. Duff, L. Giraud, and G. Sylvand. Combiningfast multipole techniques and an approximate inverse preconditionerfor large parallel electromagnetics calculations. Technical Report inpreparation, CERFACS, Toulouse, France.

[24] L. M. Carvalho, L. Giraud, and P. Le Tallec. Algebraic two-levelpreconditioners for the Schur complement method. SIAM J. ScientificComputing, 22(6):1987–2005, 2001.

[25] K. Chadan, D. Colton, L. Paivarinta, and W. Rundell. An introductionto Inverse Scattering and Inverse Spectral Problems. SIAM Book,Philadelphia, 1997.

[26] T. Chan and T.P. Mathew. Domain Decomposition Algorithms,volume 3 of Acta Numerica, pages 61–143. Cambridge UniversityPress, Cambridge, 1994.

[27] T. Chan and H. A. van der Vorst. Approximate andincomplete factorizations. In D. E. Keyes, A. Sameh, andV. Venkatakrishnan, editors, Parallel Numerical Algorithms,ICASE/LaRC Interdisciplinary Series in Science and EngineeringVolume 4, Kluwer Academic, Dordecht, 1997, pages 167–202, 1997.

[28] K. Chen. On a class of preconditioning methods for dense linearsystems from boundary elements. SIAM J. Scientific Computing,20(2):684–698, 1998.

[29] W. C. Chew, J. M. Jin, C. C. Lu, E. Michielssen, and J. M. Song.Fast solution methods in electromagnetics. IEEE Transactions onAntennas and Propagation, 45(3):533–543, 1997.

[30] W. C. Chew and C. C. Lu. The use of Huygens’equivalence principlefor solving 3D volume integral equation of scattering. IEEE Trans.Ant. Prop., 43(5):500–507, 1995.

[31] W. C. Chew, C. C. Lu, and Y. M. Wang. Review of efficientcomputation of three-dimensional scattering of vector electromagneticwaves. J. Opt. Soc. Am. A, 11:1528–1537, 1994.


[32] W. C. Chew and Y. M. Wang. A recursive T-matrix approach forthe solution of electromagnetic scattering by many spheres. IEEETransactions on Antennas and Propagation, 41(12):1633–1639, 1993.

[33] E. Chow. Parallel implementation and practical use of sparseapproximate inverse preconditioners with a priori sparsity patterns.Int. J. High Perf. Comput. Apps., 15:56–74, 2001.

[34] E. Chow and Y. Saad. Experimental study of ILU preconditionersfor indefinite matrices. Journal of Computational and AppliedMathematics, 86:387–414, 1997.

[35] E. Chow and Y. Saad. Approximate inverse preconditioners via sparse-sparse iterations. SIAM J. Scientific Computing, 19(3):995–1023,1998.

[36] A. Cosnau. Etude d’un preconditionneur pour les matrices complexesdense symmetric issues des equations de Maxwell en formulationintegrale. Note technique ONERA, 1996. 142328.96/DI/MT.

[37] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetricmatrices. In Proceedings 24th National Conference of the Associationfor Computing Machinery, Brandon Press, New Jersey, pages 157–172. Brandon Press, New Jersey, 1969.

[38] E. Darve. The fast multipole method (I): Error analysis andasymptotic complexity. SIAM J. Numerical Analysis, 38(1):98–128,2000.

[39] E. Darve. The fast multipole method: Numerical implementation. J.Comp. Phys., 160(1):195–240, 2000.

[40] H. Dag. Iterative Methods and Parallel Computation for PowerSystems. PhD thesis, Department of Electrical Engineering. Universityof Winsconsin, Madison, WI, 1996.

[41] E. F. D’Azevedo, P. A. Forsyth, and W.-P. Tang. Drop tolerancepreconditioning for incompressible viscous flow. Int. J. ComputerMathematics, 44:301–312, 1992.

[42] C. de Boor. Dichotomies for band matrices. SIAM J. NumericalAnalysis, 17:894–907, 1980.

[43] E. de Sturler. Inner-outer methods with deflation for linear systemswith multiple right-hand sides. In Householder Symposium XIII,Proceedings of the Householder Symposium on Numerical Algebra,Pontresina, Switzerland, pages 193–196, June 17 - 26, 1996.


[44] B. Dembart and M. A. Epton. A 3D fast multipole method forelectromagnetics with multiple levels. Tech. Rep. ISSTECH-97-004,The Boeing Company, Seattle, WA, 1994.

[45] B. Dembart and M. A. Epton. Low frequency multipole translationtheory for the Helmholtz equation. Tech. Rep. SSGTECH-98-013, TheBoeing Company, Seattle, WA, 1998.

[46] B. Dembart and M. A. Epton. Spherical harmonic analysis andsynthesis for the fast multipole method. Tech. Rep. SSGTECH-98-014, The Boeing Company, Seattle, WA, 1998.

[47] B. Dembart and E. Yip. Matrix assembly in FMM-MOM codes. Tech.Rep. ISSTECH-97-002, The Boeing Company, Seattle, WA, 1997.

[48] S. Demko. Inverses of band matrices and local convergence of splineprojections. SIAM J. Numerical Analysis, 14:616–619, 1977.

[49] S. Demko, W. F. Moss, and P. W. Smith. Decay rates for inverses ofband matrices. Mathematics of Computation, 43:491–499, 1984.

[50] B. Depres. Quadratic functional and integral equations for harmonicwave problems in exterior domains. Mathematical Modelling andNumerical Analysis, 31(6):679–732, 1997.

[51] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. Algorithm679: A set of level 3 basic linear algebra subprograms. ACM Trans.Math. Softw., 16:18–28, 1990.

[52] J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set oflevel 3 basic linear algebra subprograms. ACM Trans. Math. Softw.,16:1–17, 1990.

[53] I. S. Duff, R. G. Grimes, and J. G. Lewis. User’s guide for Harwell-Boeing sparse matrix test problems collection. Tech. Report RAL-92-086, Computing and Information Systems Department, RutherfordAppleton Laboratory, Didcot, UK, 1992.

[54] I. S. Duff and G. A. Meurant. The effect of ordering on preconditionedconjugate gradient. BIT, 29:635–657, 1989.

[55] I. S. Duff and H. A. van der Vorst. Preconditioning and parallelpreconditioning. Tech. Rep. TR/PA/98/23, CERFACS, France, 1998.

[56] A. Edelman. The first annual large dense linear system survey. TheSIGNUM Newsletter, 26:6–12, 1991.


[57] A. Edelman. Large dense numerical linear algebra in 1993:The parallel computing influence. Journal of SupercomputingApplications., 7:113–128, 1993.

[58] V. Eijkhout and B. Polman. Decay rates of inverses of banded M-matrices that are near to Toeplitz matrices. Linear Algebra and itsApplications, 109:247–277, 1988.

[59] J. Erhel, K. Burrage, and B. Pohl. Restarted GMRES preconditionedby deflation. J. Comput. Appl. Math., 69:303–318, 1996.

[60] Q. Fan, P. A. Forsyth, W.-P. Tang, and J. R. F. McMacken.Performance issues for iterative solvers in semiconductor devicesimulation. SIAM J. Scientific Computing, 1:100–117, 1996.

[61] M. R. Field. An efficient parallel preconditioner for the conjugategradient algorithm. Technical Report HDL-TR-97-175, Hitachi DublinLaboratory, Trinity College, Dublin, 1998.

[62] V. Fraysse and L. Giraud. An implementation of block QMR forJ-symmetric matrices. Technical Report TR/PA/97/57, CERFACS,Toulouse, France, 1997.

[63] V. Fraysse, L. Giraud, and S. Gratton. A set of GMRES routines forreal and complex arithmetics. Tech. Rep. TR/PA/97/49, CERFACS,1997.

[64] V. Fraysse, L. Giraud, and S. Gratton. A set of Flexible-GMRES routines for real and complex arithmetics. Technical ReportTR/PA/98/20, CERFACS, Toulouse, France, 1998.

[65] P. O. Frederickson. Fast approximate inversion of large sparse linearsystems. Math. Report 7, Lakehead University, Thunder Bay, Canada,1975.

[66] R. W. Freund. A transpose-free quasi-minimal residual algorithmfor non-Hermitian linear systems. SIAM J. Scientific Computing,14(2):470–482, 1993.

[67] R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residualmethod for non-Hermitian linear systems. Numerische Mathematik,60(3):315–339, 1991.

[68] R. W. Freund and N. M. Nachtigal. An implementation of the QMRmethod based on coupled two-term recurrences. SIAM J. ScientificComputing, 15(2):313–337, 1994.


[69] R. W. Freund and N. M. Nachtigal. Software for simplified Lanczosand QMR algorithms. Applied Numerical Mathematics, 19:319–341,1995.

[70] R. W. Freund and N. M. Nachtigal. QMRPACK: a package of QMRalgorithms. ACM Transactions on Mathematical Software, 22:46–77,1996.

[71] J. George and J. W. H. Liu. The evolution of the minimum degreeordering algorithm. SIAM Review, 31:1–19, 1989.

[72] J. R. Gilbert. Predicting structure in sparse matrix computations.SIAM J. Matrix Analysis and Applications, 15:62–79, 1994.

[73] G. H. Golub and C. F. Van Loan. Matrix computations. Johns HopkinsStudies in the Mathematical Sciences. The Johns Hopkins UniversityPress, Baltimore, MD, USA, third edition, 1996.

[74] G. H. Golub and H. A. van der Vorst. Closer to the solution: iterativelinear solvers. Technical report, In I.S. Duff and G.A. Watson editors,1997. The State of the Art in Numerical Analysis.

[75] S. A. Goreinov, E. E. Tyrtyshnikov, and A. Yu Yeremin. Matrix-freeiterative solution strategies for large dense linear systems. NumericalLinear Algebra with Applications, 4(4):273–294, 1997.

[76] N. I. M. Gould and J. A. Scott. On approximate-inversepreconditioners. Tech. Rep. 95-026, RAL, 1995.

[77] N. I. M. Gould and J. A. Scott. Sparse approximate-inversepreconditioners using norm-minimization techniques. SIAM J.Scientific Computing, 19(2):605–625, 1998.

[78] A. Grama, V. Kumar, and A. Sameh. On n-body simulationsusing message-passing parallel computers. In Sidney Karin, editor,Proceedings of the 1995 SIAM Conference on Parallel Processing, SanFrancisco, CA, USA, 1995.

[79] A. Grama, V. Kumar, and A. Sameh. Parallel matrix-vector productusing approximate hierarchical methods. In Sidney Karin, editor,Proceedings of the 1995 ACM/IEEE Supercomputing Conference,December 3–8, 1995, San Diego Convention Center, San Diego, CA,USA, New York, NY, USA, 1995. ACM Press and IEEE ComputerSociety Press.

[80] A. Grama, V. Kumar, and A. Sameh. Scalable parallel formulations ofthe Barnes–Hut method for n-body simulations. Parallel Computing,24(5–6):797–822, 1998.


[81] L. Greengard and W. Gropp. A parallel version of the fast multipolemethod. Comput. Math. Appl., 20:63–71, 1990.

[82] L. Greengard and V. Rokhlin. A fast algorithm for particlesimulations. Journal of Computational Physics, 73:325–348, 1987.

[83] M. Grote. Nonreflecting boundary conditions for electromagneticscattering. Int. J. Numer. Model., 13:397–416, 2000.

[84] M. Grote and T. Huckle. Parallel preconditionings with sparseapproximate inverses. SIAM J. Scientific Computing, 18:838–853,1997.

[85] W. Hackbusch. Multigrid methods and applications. Springer-Verlag,1985.

[86] R. Harrington. Origin and development of the Method of Moments forfield computation. IEEE Antennas and Propagation Magazine, 1990.

[87] HSL. A collection of Fortran codes for large scale scientificcomputation, 2000. http://www.numerical.rl.ac.uk/hsl.

[88] I. C. F. Ipsen and C. D. Meyer. The idea behind Krylovmethods. Tech. Rep. CRSC-TR97-3, NCSU Center For Research InScientific Computation, January 31, 1997. To Appear in AmericanMathematical Monthly.

[89] J. M. Jin and V. V. Liepa. A note on hybrid finite element method forsolving scattering problems. IEEE Trans. Ant. Prop., 36(10):1486–1489, 1988.

[90] W. R. Scott Jr. Errors due to spatial discretization and numericalprecision in the finite-element method. IEEE Trans. Ant. Prop.,42(11):1565–1569, 1994.

[91] I. E. Kaporin. A preconditioned conjugate gradient method for solvingdiscrete analogs of differential problems. Differential Equations,26:897–906, 1990.

[92] S. A. Kharchenko and A. Yu. Yeremin. Eigenvalue translation basedpreconditioners for the GMRES(k) method. Numerical Linear Algebrawith Applications, 2(1):51–77, 1995.

[93] L. Yu. Kolotilina. Explicit preconditioning of systems of linearalgebraic equations with dense matrices. J. Sov. Math., 43:2566–2573, 1988. English translation of a paper first published in ZapisliNauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogoim. V.A. Steklova AN SSSR 154 (1986) 90-100.


[94] L. Yu. Kolotilina. Twofold deflation preconditioning of linearalgebraic systems. I. Theory. Technical Report EM-RR 20/95, ElegantMathematics, Inc., 1995. Available in Postscript format at the URLhttp://www.elegant-math.com/abs-emrr.htm.

[95] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximateinverse preconditionings. I: Theory. SIAM J. Matrix Analysis andApplications, 14:45–58, 1993.

[96] L. Yu Kolotilina and A. Yu. Yeremin. Factorized sparse approximateinverse preconditionings. II: Solution of 3D FE systems on massivelyparallel computers. Int J. High Speed Computing, 7:191–215, 1995.

[97] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin. Factorizedsparse approximate inverse preconditionings. IV: Simple approaches torising efficiency. Numerical Linear Algebra with Applications, 6:515–531, 1999.

[98] L. Yu Kolotilina, A. Yu. Yeremin, and A. A. Nikishin.Factorized sparse approximate inverse preconditionings. III: Iterativeconstruction of preconditioners. Journal of Mathematical Sciences,101:3237–3254, 2000. Originally published in Russian inZap. Nauchn. Semin. POMI, 248:17-48, 1998.

[99] K. S. Kunz and R. J. Luebbers. The Finite Difference Time DomainMethod for Electromagnetics. CRC Press, Boca Raton, 1993.

[100] R. Lee and A. C. Cangellaris. A study of discretization error in thefinite element approximation of wave solution. IEEE Trans. Ant.Prop., 40(5):542–549, 1992.

[101] S. W. Lee, H. Ling, and R. C. Chou. Ray tube integration in shootingand bouncing ray method. Micro. Opt. Tech. Lett., 1:285–289, 1988.

[102] Z. Li, Y. Saad, and M. Sosonkina. pARMS: a parallel version ofthe algebraic recursive multilevel solver. Technical Report umsi-2001-100, Minnesota Supercomputer Institute, University of Minnesota,Minneapolis, MN, 2001.

[103] J. C. Maxwell. A dynamical theory of the electromagnetic field. RoyalSociety Transactions, CLV, 1864. Reprinted in R. A. R. Tricker,The Contributions of Faraday and Maxwell to Electrical Science,(Pergamon Press, 1966).

[104] B. McDonald and A. Wexler. Finite element solution of unboundedfield problem. IEEE Trans. Microwave Theory Tech., 20:841–847,1972.


[105] J. A. Meijerink and H. A. van der Vorst. An iterative solution methodfor linear systems of which the coeffcient matrix is a symmetric M-matrix. Mathematics of Computation, 31:148–162, 1977.

[106] G. Meurant. A review on the inverse of symmetric tridiagonal andblock tridiagonal matrices. SIAM J. Matrix Analysis and Applications,13:707–728, 1992.

[107] E. Michielssen and A. Boag. Multilevel evaluation of electromagneticfields for the rapid solution of scattering problems. Micro. Opt. Tech.Lett., 7(17):790–795, 1994.

[108] E. Michielssen and A. Boag. A multilevel matrix decompositionalgorithm for analyzing scattering from large structures. IEEETransactions on Antennas and Propagation, 44(8):1086 –1093, 1996.

[109] M. Magolu monga Made. Incomplete factorization basedpreconditionings for solving the Helmholtz equation. Int. Journal forNumerical Methods in Engineering, 50(5):1077–1101, 2001.

[110] M. Magolu monga Made, R. Beauwens, and G. Warzee.Preconditioning of discrete Helmholtz operators perturbed by adiagonal complex matrix. Communications in Numerical Methods inEngineering, 11:801–817, 2000.

[111] M. Magolu monga Made and H. A. van der Vorst. ParIC: Afamily of parallel incomplete Cholesky preconditioners. In M. Bubak,H. Afsarmanesh, R. Williams, and B. Hertzberger, editors, HighPerformance Computing and Networking. Proceedings of the HPCNEurope 2000 Conference, Amsterdam. Lecture Notes in ComputerScience, 1823, pages 89–98. Springer-Verlag, Berlin, 2000.

[112] R. B. Morgan. Implicitely restarted GMRES and Arnoldi methodsfor nonsymmetric systems of equations. SIAM J. Matrix Analysis andApplications, 21(4):1112–1135.

[113] R. B. Morgan. A restarted GMRES method augmented witheigenvectors. SIAM J. Matrix Analysis and Applications, 16:1154–1171, 1995.

[114] A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparsematrices with eigenvectors of graphs. SIAM J. Matrix Analysis andApplications, 11(3):430–452, 1990.

[115] J. Rahola. Experiments on iterative methods and the fast multipolemethod in electromagnetic scattering calculations. Technical ReportTR/PA/98/49, CERFACS, Toulouse, France, 1998.


[116] S. M. Rao, D. R. Wilton, and A. W. Glisson. Electromagneticscattering by surfaces of arbitrary shape. IEEE Trans. AntennasPropagat., AP-30:409–418, 1982.

[117] J.-C. Rioual. Solving linear systems for semiconductor devicesimulations on parallel distributed computers. PhD thesis, CERFACS,Toulouse, France, 2002.

[118] J. W. Ruge and K. Stuben. Algebraic multigrid (AMG). 1987.In Multigrid Methods, Frontiers in Applied Mathematics 3, S.F.McCormick, ed., SIAM, Philadelphia, PA, pp. 73-130.

[119] Y. Saad. Projection and deflation methods for partial pole assignmentin linear state feedback. IEEE Trans. Automat. Contr., 33(3):290–297,1988.

[120] Y. Saad. Analysis of augmented Krylov subspace techniques. SIAMJ. Scientific Computing, 14:461–469, 1993.

[121] Y. Saad. A flexible inner-outer preconditioned GMRES algorithm.SIAM J. Scientific and Statistical Computing, 14:461–469, 1993.

[122] Y. Saad. Iterative Methods for Sparse Linear Systems. PWSPublishing, New York, 1996.

[123] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residualalgorithm for solving nonsymmetric linear systems. SIAM J. Scientificand Statistical Computing, 7:856–869, 1986.

[124] K. E. Schmidt and M. A. Lee. Implementing the fast multipole methodin three dimensions. J. Statist. Phys., 63:1120, 1991.

[125] P. P. Silvester and R. L. Ferrari. Finite Elements for ElectricalEngineers. Cambridge University Press, Cambridge, 1990.

[126] J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy.Load Balancing and Data Locality in Adaptive Hierarchical n-bodyMethods: Barnes-Hut, Fast Multiple, and Radiosity. Journal ofParallel and Distributed Computing, 27:118–141, 1995.

[127] G. L. G. Sleijpen and H. A. van der Vorst. Maintaining convergenceproperties of Bi-CGSTAB methods in finite precision arithmetic.Numerical Algorithms, 10:203–223, 1995.

[128] G. L. G. Sleijpen and H. A. van der Vorst. Reliable updated residualsin hybrid Bi-CG methods. Computing, 56:141–163, 1996.


[129] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema. Bi-CGSTAB(l) and other hybrid Bi-CG methods. Numerical Algorthms,7:75–109, 1994.

[130] B. F. Smith, P. Bjørstad, and W. Gropp. Domain Decomposition,Parallel Multilevel Methods for Elliptic Partial Differential Equations.Cambridge University Press, New York, 1st edition, 1996.

[131] P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linearsystems. SIAM J. Scientific and Statistical Computing, 10:36–52,1989.

[132] D. C. Sorensen. Implicit application of polynomial filters in a k-stepArnoldi method. SIAM J. Matrix Analysis and Applications, 13:357–385, 1992.

[133] R. Suda. New Iterative Linear Solvers for Parallel Circuit Simulation.PhD thesis, Department of Information Sciences, University of Tokio,1996.

[134] X. Sun and N. P. Pitsianis. A Matrix Version od the Fast MultipoleMethod. SIAM Review, 43(2):289–300, 2001.

[135] G. Sylvand. Resolution Iterative de Formulation Integrale pourHelmholtz 3D: Applications de la Methode Multipole a des Problemesde Grande Taille. PhD thesis, Ecole Nationale des Ponts et Chaussees,2002.

[136] D. B. Szyld and J. A. Vogel. A flexible quasi-minimal residual methodwith inexact preconditioning. SIAM J. Scientific Computing, 23:363–380, 2001.

[137] A. Taflove. Computational Electrodynamics: The Finite-DifferenceTime-Domain Method. Artech House, Boston, 1995.

[138] W.-P. Tang. Schwartz splitting and template operators. PhD thesis,Computer Science Dept., Stanford University, Stanford, CA, 1987.

[139] W.-P. Tang. Towards an effective sparse approximate inversepreconditioners. SIAM J. Matrix Analysis and Applications,20(4):970–986, 1998.

[140] W.-P. Tang and W. L. Wan. Sparse approximate inverse smootherfor multigrid. SIAM J. Matrix Analysis and Applications, 21(4):1236–1252, 2000.

[141] W. F. Tinney and J. W. Walker. Direct solutions of sparse networkequations by optimally ordered triangular factorization. Proc. of theIEEE, 55:1801–1809, 1967.


[142] H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly convergingvariant of Bi-CG for the solution of nonsymmetric linear systems.SIAM J. Scientific and Statistical Computing, 13:631–644, 1992.

[143] H. A. van der Vorst and C. Vuik. The superlinear convergencebehaviour of GMRES. J. Comput. Appl. Math., 48:327–341, 1993.

[144] S. A. Vavasis. Preconditioning for boundary integral equations. SIAMJ. Matrix Analysis and Applications, 13:905–925, 1992.

[145] J. L. Volakis, A. Chatterjee, and L. C. Kempel. Finite element methodsfor electromagnetics. IEEE Press, Piscataway, NJ, 1998.

[146] J. W. Watts. A conjugate gradient truncated direct method for theiterative solution of the reservoir simulation pressure equation. Societyof Petroleum Engineers Journal, 21:345–353, 1981.

[147] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford UniversityPress, Walton Street, Oxford OX2 6DP, UK, 1965.

[148] J. Zhang. A sparse approximate inverse technique for parallelpreconditioning of general sparse matrices. Tech. Rep. 281-98,Department of Computer Science, University of Kentucky, KY, 1998.Accepted for publication in Applied Mathematics and Computation.

[149] F. Zhao and S. L. Johnsson. The parallel multipole method on theconnection machine. SIAM J. Scientific and Statistical Computing,12:1420–1437, 1991.

PhD Thesis - ICINo Ordre: 1879 PhD Thesis Sp´ecialit´e : Informatique Sparse preconditioners for...

Documents

Transcript of PhD Thesis - ICINo Ordre: 1879 PhD Thesis Sp´ecialit´e : Informatique Sparse preconditioners for...