Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references...

69
Solving large sparse Ax = b . Stopping criteria, & backward stability of MGS-GMRES. Chris Paige (McGill University); Miroslav Rozlo ˇ zn ´ ık & Zden ˇ ek Strako ˇ s (Academy of Sciences of the Czech Republic). ———————————————– .pdf & .ps files of this talk are available from: http://www.cs.mcgill.ca/chris/pub/list.html Bob-05 – p.1/69

Transcript of Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references...

Page 1: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Solving large sparse Ax = b .Stopping criteria,

& backward stability of MGS-GMRES.

Chris Paige (McGill University);

Miroslav Rozloznık & Zdenek Strakos(Academy of Sciences of the Czech Republic).

———————————————–.pdf & .ps files of this talk are available from:http://www.cs.mcgill.ca/∼chris/pub/list.html

Bob-05 – p.1/69

Page 2: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

BackgroundThe talk starts on slide 9, after this background.

This talk discusses material that the three of ushave been interested in for many years.

About 1/2 of this talk was given by Chris Paige atan excellent conference to celebrate Bob Russell —

The International Conference on Adaptivity and Beyond:Computational Methods for Solving Differential Equations.Vancouver, August 3–6, 2005.

The response motivated us to distribute it widely,& to encourage writers to present the ideas intexts that applications-oriented people might turn to.

Bob-05 – p.2/69

Page 3: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Backward Errors”

The backward error (BE) material for this appears inthe literature. The backward error theory and historyis given elegantly by Higham, 2nd Edn., 2002: §1.10;pp. 29–30; Chapter 7, in particular §7.1, 7.2 and 7.7;and also by Stewart & Sun, 1990, Section III/2.3;Meurant, 1999, Section 2.7; among others —but this is not easily accessible to the non-expert.The original BE references are:Prager & Oettli, Num. Math. 1964,for componentwise analysis, which led to:Rigal & Gaches, J. Assoc. Comput. Mach. 1967,for normwise analysis (used here).

Bob-05 – p.3/69

Page 4: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Stopping Criteria” (part 1)The relation of BEs to stopping criteria for Ax = bwas described by Rigal & Gaches, 1967, §5,and is explained and thoroughly discussed inHigham, 2nd Edn., 2002, §17.5; and in“Templates”, Barrett et al., 1995, §4.2.

These ideas have been used for constructing stoppingcriteria for years. For example, in Paige & Saunders,ACM Trans. Math. Software 1982, the backwarderror idea is used to derive a family of stoppingcriteria which quantify the levels of confidence in Aand b, and which are implemented in the generallyavailable software realization of the LSQR method.

Bob-05 – p.4/69

Page 5: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Stopping Criteria” (part 1)For other general considerations, methodology andapplications seeArioli, Duff & Ruiz, SIAM J. Mat. An. Appl. 1992;Arioli, Demmel & Duff,

SIAM J. Matrix Anal. Appl. 1989;Chatelin & Frayssé, 1996;Kasenally & Simoncini, SIAM J. Numer. An. 1997.

For more recent sources seeArioli, Noulard & Russo, Calcolo, 2001;Arioli, Loghin & Wathen, Numer. Math. 2005;Paige & Strakoš, SIAM J. Sci. Comput. 2002;Strakoš & Liesen, ZAMM, 2005.

Bob-05 – p.5/69

Page 6: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Stopping Criteria” (part 1)These ideas are not widely used by the applicationscommunity, apparently because very little attentionhas been paid to stopping criteria in some majornumerical linear algebra or iterative methods textbooks (e.g. Watkins, Demmel, Bau & Trefethen,Saad), or reference books (e.g. Golub & Van Loan).They are not spelt out in some other leading bookson iterative methods, (e.g. Axelsson, Greenbaum,Meurant), but references are given in van der Vorst.Deuflhard & Hohmann, §2.4.3, do introduce the topic.——————————————————–It would be healthy for users and also for ourcommunity if stopping criteria were consideredto be fundamental parts of iterative computations,rather than as miscellaneous issues (if at all).

Bob-05 – p.6/69

Page 7: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Stopping Criteria” (part 1)This talk presents the backward error ideas in a simpleform for use in stopping criteria for iterative methods.

It emphasizes that the normwise relative backwarderror (NRBE) is the one to use when you know youralgorithm is backward stable. It should convince theuser that unless there is a good reason to prefer someother stopping criterion, NRBE should be used inscience and engineering calculations.————————————————————-

For clarity we will mainly use the 2-norm here,but other subordinate matrix norms are possible.See e.g. Higham, 2nd Edn. 2002, §7.1.

Bob-05 – p.7/69

Page 8: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Re: “Stopping Criteria” (part 1)An example when some other stopping criteria arepreferable:

Conjugate gradient methodsfor solving discretized elliptic self-adjoint PDEs,

see:Arioli, Numer. Math. 2004;Dahlquist, Eisenstat & Golub, J.Math.Anal.Appl.’72;Dahlquist, Golub & Nash, 1978;Hestenes & Stiefel, J. Res. Nat. Bur. St. 1952;Meurant, Numerical Algorithms 1999;Meurant & Strakoš, Acta Numerica 2006;Strakoš & Tichý, ETNA 2002;Strakoš & Tichý, BIT 2005.

Bob-05 – p.8/69

Page 9: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

NotationVectors a, b, x, . . . ; iterates x1, x2, . . .

Vector norm ‖x‖2 =√

xTx

Bob-05 – p.9/69

Page 10: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

NotationVectors a, b, x, . . . ; iterates x1, x2, . . .

Vector norm ‖x‖2 =√

xTxMatrices nonsingular A ∈ R

n×n; B, . . .Singular values σ1(A) ≥ . . . ≥ σn(A) > 0

Bob-05 – p.10/69

Page 11: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

NotationVectors a, b, x, . . . ; iterates x1, x2, . . .

Vector norm ‖x‖2 =√

xTxMatrices nonsingular A ∈ R

n×n; B, . . .Singular values σ1(A) ≥ . . . ≥ σn(A) > 0Condition number

κ2(A) = σ1(A)/σn(A).

Bob-05 – p.11/69

Page 12: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

NotationVectors a, b, x, . . . ; iterates x1, x2, . . .

Vector norm ‖x‖2 =√

xTxMatrices nonsingular A ∈ R

n×n; B, . . .Singular values σ1(A) ≥ . . . ≥ σn(A) > 0Condition number

κ2(A) = σ1(A)/σn(A).

Computer precision ε ≈ 10−16 (IEEE double).

Bob-05 – p.12/69

Page 13: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Matrix normsThe results hold for general subordinate matrix norms.For clarity, we just consider:

Spectral norm: ‖A‖2 = max‖x‖2=1

‖Ax‖2 = σ1(A).

Frobenius: ‖A‖2

F = trace(ATA) =n

i=1

σ2

i (A).

Matrix norms for rank-one matrices: if B = cdT :

‖B‖2 = ‖cdT‖2 = ‖c‖2 ‖d‖2 = ‖cdT‖F = ‖B‖F

Bob-05 – p.13/69

Page 14: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Iterative methods – large Ax = bProduce approximations to the solution x :

x1, x2, . . . , xk, . . .

with residuals

. . . , rk = b − Axk, . . .

Each iteration is expensive, hope for � n steps.

Bob-05 – p.14/69

Page 15: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Iterative methods – large Ax = bProduce approximations to the solution x :

x1, x2, . . . , xk, . . .

with residuals

. . . , rk = b − Axk, . . .

Each iteration is expensive, hope for � n steps.

When do we STOP?

Bob-05 – p.15/69

Page 16: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Data accurate to O(ε) (relatively).

We will first treat the case of finding an xk about asgood as we can hope for the given data A and b,using computer precision ε,and a numerically stable algorithm.

Later we will consider inaccurate data.

Bob-05 – p.16/69

Page 17: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

Bob-05 – p.17/69

Page 18: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

What if ‖b‖2 is huge ?, or tiny ?

Bob-05 – p.18/69

Page 19: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

What if ‖b‖2 is huge ?, or tiny ?

• Test the relative residual, e.g.

‖rk‖2

‖b‖2

=‖b − Axk‖2

‖b‖2

≤ O(ε)

Bob-05 – p.19/69

Page 20: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

What if ‖b‖2 is huge ?, or tiny ?

• Test the relative residual, e.g.

‖rk‖2

‖b‖2

=‖b − Axk‖2

‖b‖2

≤ O(ε) ???

Bob-05 – p.20/69

Page 21: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

What if ‖b‖2 is huge ?, or tiny ?

• Test the relative residual, e.g.

‖rk‖2

‖b‖2

=‖b − Axk‖2

‖b‖2

≤ O(ε) ???• Test the Normwise Relative Backward Error, e.g.

‖rk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε)

Bob-05 – p.21/69

Page 22: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Basic stopping criteria• Test the residual norm, e.g. ‖rk‖2 ≤ O(ε)

What if ‖b‖2 is huge ?, or tiny ?

• Test the relative residual, e.g.

‖rk‖2

‖b‖2

=‖b − Axk‖2

‖b‖2

≤ O(ε) ???• Test the Normwise Relative Backward Error, e.g.

‖rk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε)

• Why use NRBE? (‖ · ‖1, ‖ · ‖∞, ‖ · ‖F , etc.)Bob-05 – p.22/69

Page 23: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A Backward Stable (BS) Alg.will eventually give the exact answer to anearby problem, e.g. for the 2-norm case:an iterate xk satisfying

(A + δAk) xk = b + δbk,

‖δAk‖2 ≤ O(ε)‖A‖2, ‖δbk‖2 ≤ O(ε)‖b‖2.

Bob-05 – p.23/69

Page 24: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A Backward Stable (BS) Alg.will eventually give the exact answer to anearby problem, e.g. for the 2-norm case:an iterate xk satisfying

(A + δAk) xk = b + δbk,

‖δAk‖2 ≤ O(ε)‖A‖2, ‖δbk‖2 ≤ O(ε)‖b‖2.

( J.H. Wilkinson 1950’s, for n step algorithms, e.g.

Cholesky: (A + δA) xc = b, ‖δA‖2 ≤ 12n2ε‖A‖2 ).

Such an xk is called a backward stable solution.

δAk and δbk can be called backward errors.

Bob-05 – p.24/69

Page 25: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A Backward Stable (BS) Alg.will eventually give the exact answer to anearby problem, e.g. for the 2-norm case:an iterate xk satisfying

(A + δAk) xk = b + δbk,

‖δAk‖2 ≤ O(ε)‖A‖2, ‖δbk‖2 ≤ O(ε)‖b‖2.

Then the true residual rk will satisfy

rk = b − Axk = δAk xk − δbk,

‖rk‖2 ≤ O(ε)(‖A‖2 ‖xk‖2 + ‖b‖2).

Bob-05 – p.25/69

Page 26: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A Backward Stable (BS) Alg.will eventually give the exact answer to anearby problem, e.g. for the 2-norm case:an iterate xk satisfying

(A + δAk) xk = b + δbk,

‖δAk‖2 ≤ O(ε)‖A‖2, ‖δbk‖2 ≤ O(ε)‖b‖2.

Then the true residual rk will satisfy

rk = b − Axk = δAk xk − δbk,

‖rk‖2 ≤ O(ε)(‖A‖2 ‖xk‖2 + ‖b‖2).

& NRBE =‖rk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε),

satisfying the simple 2-norm NRBE test. Bob-05 – p.26/69

Page 27: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

If and only if?Here a backward stable solution xk satisfies

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε). (∗)

Bob-05 – p.27/69

Page 28: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

If and only if?Here a backward stable solution xk satisfies

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε). (∗)

But if an xk satisfies this,is it necessarily a backward stable solution?

Bob-05 – p.28/69

Page 29: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

If and only if?Here a backward stable solution xk satisfies

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε). (∗)

But if an xk satisfies this,is it necessarily a backward stable solution?

YES. Rigal & Gaches, JACM 1967:If xk satisfies (∗) then there existbackward errors δAk & δbk such that

(A + δAk) xk = b + δbk,

‖δAk‖2 ≤ O(ε)‖A‖2, ‖δbk‖2 ≤ O(ε)‖b‖2.

Bob-05 – p.29/69

Page 30: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Proof: Suppose 2-norm NRBE ≤ O(ε).

Take δAk =

{ ‖A‖2‖xk‖2

‖b‖2 + ‖A‖2‖xk‖2

}

rkxTk

‖xk‖22

,

and δbk = −{ ‖b‖2

‖b‖2 + ‖A‖2‖xk‖2

}

rk .

Bob-05 – p.30/69

Page 31: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Proof: Suppose 2-norm NRBE ≤ O(ε).

Take δAk =

{ ‖A‖2‖xk‖2

‖b‖2 + ‖A‖2‖xk‖2

}

rkxTk

‖xk‖22

,

and δbk = −{ ‖b‖2

‖b‖2 + ‖A‖2‖xk‖2

}

rk .

Then δAk xk − δbk = rk = b − Axk ,

so (A + δAk) xk = b + δbk , &‖δAk‖2 ≤ O(ε)‖A‖2 , ‖δbk‖2 ≤ O(ε)‖b‖2 .

Q.E.D.

Bob-05 – p.31/69

Page 32: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Summary (2-norm case)

Stopping criterion: STOP IF

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε) .

Bob-05 – p.32/69

Page 33: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Summary (2-norm case)

Stopping criterion: STOP IF

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε) .

• A backward stable solutionwill trigger this stopping criterion.

Bob-05 – p.33/69

Page 34: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Summary (2-norm case)

Stopping criterion: STOP IF

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε) .

• A backward stable solutionwill trigger this stopping criterion.

• If this stopping criterion is triggered,we have a backward stable solution. Optimal!(Minimum number of steps for the chosen O(ε)).

Bob-05 – p.34/69

Page 35: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Summary (2-norm case)

Stopping criterion: STOP IF

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

≤ O(ε) .

• A backward stable solutionwill trigger this stopping criterion.

• If this stopping criterion is triggered,we have a backward stable solution. Optimal!

• So use this stopping criterion for backwardstable algorithms (with data accurate to O(ε)).

Bob-05 – p.35/69

Page 36: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A BS iterative computationA is FS1836 from the Matrix market:183 x 183, 1069 entries, real unsymmetric.Condition number κ2(A) ≈ 2 × 1011.

(Chemical kinetics problem from atmosphericpollution studies. Alan Curtis, AERE Harwell, 1983).

Bob-05 – p.36/69

Page 37: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A BS iterative computationA is FS1836 from the Matrix market:183 x 183, 1069 entries, real unsymmetric.Condition number κ2(A) ≈ 2 × 1011.

Solve two artificial test problems:

1: x = e =

1...1

, b := Ae ,

2: b := e ,

with the initial approximation x0 = 0 (there must al-ways be a good reason for using a nonzero x0 !).

Bob-05 – p.37/69

Page 38: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

In the following two graphical slides,concentrate on the two immediate ↘ plots.

The ↗ plot denotes (loss of) orthogonality inMGS-GMRES.

The − − − − plots denote singular valuesof supposedly orthonormal matrices Vk,& are of negligible interest to a general audience,but crucial to the num. stability of MGS-GMRES.

Bob-05 – p.38/69

Page 39: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

1: ‖rk‖2/‖b‖2 ....... , ‖rk‖2/(‖b‖2+‖A‖2‖xk‖2) ——

0 10 20 30 40 50 60 70 80 90 10010−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

back

war

d er

ror,

rela

tive

resi

dual

nor

ms,

orth

ogon

ality

MGS−GMRES implementation, b=A*ones(183,1)

FS1836(183), cond(A)=1.7367e+11

Bob-05 – p.39/69

Page 40: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

2: ‖rk‖2/‖b‖2 ....... , ‖rk‖2/(‖b‖2+‖A‖2‖xk‖2) ——

0 10 20 30 40 50 60 70 80 90 10010−18

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

102

iteration number

back

war

d er

ror,

rela

tive

resi

dual

nor

ms,

orth

ogon

ality

MGS−GMRES implementation, b=ones(183,1)

FS1836(183), cond(A)=1.7367e+11

Bob-05 – p.40/69

Page 41: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

We see‖rk‖2‖b‖2

can be VERY misleading.

But the normwise relative backward error

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

is EXCELLENT, theoretically and computationally.

Bob-05 – p.41/69

Page 42: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

We see‖rk‖2‖b‖2

can be VERY misleading.

But the normwise relative backward error

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

is EXCELLENT, theoretically and computationally.

A low at k ≈ 45 for n = 183, then could increase!A good stopping criterion is very important.

Bob-05 – p.42/69

Page 43: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

We see‖rk‖2‖b‖2

can be VERY misleading.

But the normwise relative backward error

NRBE =‖b − Axk‖2

‖b‖2 + ‖A‖2‖xk‖2

is EXCELLENT, theoretically and computationally.

A low at k ≈ 45 for n = 183, then could increase!A good stopping criterion is very important.

Similar ideas can apply to iterative methodsfor other problems, e.g. NLE, SVD, EVP, . .

Bob-05 – p.43/69

Page 44: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Inaccurate Data? Stop Early!Usually A ≈ A, b ≈ b where A & b areideal unknowns.

Bob-05 – p.44/69

Page 45: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Inaccurate Data? Stop Early!Usually A ≈ A, b ≈ b where A & b areideal unknowns. Suppose we know α, β where

A = A + δA, b = b + δb,

‖δA‖2 ≤ α‖A‖2, ‖δb‖2 ≤ β‖b‖2. (∗)

Bob-05 – p.45/69

Page 46: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Inaccurate Data? Stop Early!Usually A ≈ A, b ≈ b where A & b areideal unknowns. Suppose we know α, β where

A = A + δA, b = b + δb,

‖δA‖2 ≤ α‖A‖2, ‖δb‖2 ≤ β‖b‖2. (∗)

Stopping criterion:‖b − Axk‖2

β‖b‖2 + α‖A‖2‖xk‖2

≤ 1.

NOTE: Here “ ≤ 1 ”. Previously “ ≤ O(ε) ”.

Now the “accuracy measures” are α and β,and they appear in the denominator.

Bob-05 – p.46/69

Page 47: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Inaccurate Data? Stop Early!Usually A ≈ A, b ≈ b where A & b areideal unknowns. Suppose we know α, β where

A = A + δA, b = b + δb,

‖δA‖2 ≤ α‖A‖2, ‖δb‖2 ≤ β‖b‖2. (∗)

Justification for stopping criterion: If‖b − Axk‖2

β‖b‖2 + α‖A‖2‖xk‖2

≤ 1,

∃ δAk, δbk satisfying (∗), and(A + δAk) xk = b + δbk.

Bob-05 – p.47/69

Page 48: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Inaccurate Data? Stop Early!Usually A ≈ A, b ≈ b where A & b areideal unknowns. Suppose we know α, β where

A = A + δA, b = b + δb,

‖δA‖2 ≤ α‖A‖2, ‖δb‖2 ≤ β‖b‖2. (∗)

Justification for stopping criterion: If‖b − Axk‖2

β‖b‖2 + α‖A‖2‖xk‖2

≤ 1,

∃ δAk, δbk satisfying (∗), and(A + δAk) xk = b + δbk.

xk the exact answer to a possible problem Axk = b.Bob-05 – p.48/69

Page 49: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Proof: If NRBE =‖b − Axk‖2

β‖b‖2 + α‖A‖2‖xk‖2

≤ 1,

take δAk =

{

α‖A‖2‖xk‖2

β‖b‖2 + α‖A‖2‖xk‖2

}

rkxTk

‖xk‖22

,

and δbk = −{

β‖b‖2

β‖b‖2 + α‖A‖2‖xk‖2

}

rk .

Then δAk xk − δbk = rk = b − Axk ,

so (A + δAk) xk = b + δbk , &‖δAk‖2 ≤ α‖A‖2 , ‖δbk‖2 ≤ β‖b‖2 .

Q.E.D.

Bob-05 – p.49/69

Page 50: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Computing ‖A‖2 ?α = β = O(ε) gives standard 2-norm BS criterion.

Write µk(ν) ≡ ‖b − Axk‖2/(β‖b‖2 + αν‖xk‖2).

Eventually want ν = ‖A‖2, µk(ν) ≤ 1.

Bob-05 – p.50/69

Page 51: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Computing ‖A‖2 ?α = β = O(ε) gives standard 2-norm BS criterion.

Write µk(ν) ≡ ‖b − Axk‖2/(β‖b‖2 + αν‖xk‖2).Many iterative methods produce a matrix Bk at stepk such that to high accuracy ‖Bk‖2 ↗ ‖A‖2

(almost always). In this case, although we do notalways have ‖Bk‖F → ‖A‖F , use the initialcriterion µk(‖Bk‖F ) ≤ 1.

Bob-05 – p.51/69

Page 52: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Computing ‖A‖2 ?α = β = O(ε) gives standard 2-norm BS criterion.

Write µk(ν) ≡ ‖b − Axk‖2/(β‖b‖2 + αν‖xk‖2).Many iterative methods produce a matrix Bk at stepk such that to high accuracy ‖Bk‖2 ↗ ‖A‖2

(almost always). In this case, although we do notalways have ‖Bk‖F → ‖A‖F , use the initialcriterion µk(‖Bk‖F ) ≤ 1. When that is met,estimate νk ≈ ‖Bk‖2 at this and further steps usingsome fast method, until µk(νk) ≤ 1.

Bob-05 – p.52/69

Page 53: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Computing ‖A‖2 ?α = β = O(ε) gives standard 2-norm BS criterion.

Write µk(ν) ≡ ‖b − Axk‖2/(β‖b‖2 + αν‖xk‖2).Many iterative methods produce a matrix Bk at stepk such that to high accuracy ‖Bk‖2 ↗ ‖A‖2

(almost always). In this case, although we do notalways have ‖Bk‖F → ‖A‖F , use the initialcriterion µk(‖Bk‖F ) ≤ 1. When that is met,estimate νk ≈ ‖Bk‖2 at this and further steps usingsome fast method, until µk(νk) ≤ 1.Bk is structured — for example:GMRES: upper Hessenberg; LSQR: bidiagonal;SYMMLQ & MINRES & CG: tridiagonal.

Bob-05 – p.53/69

Page 54: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Direct use of ‖A‖F ?For the 2-norm case, the Rigal & Gachesminimal perturbations here are

δAk =

{

α‖A‖2‖xk‖2

β‖b‖2 + α‖A‖2‖xk‖2

}

rkxTk

‖xk‖22

,

δbk = −{

β‖b‖2

β‖b‖2 + α‖A‖2‖xk‖2

}

rk .

δAk is a rank one matrix, so ‖δAk‖2 = ‖δAk‖F .And if ‖A‖2 is replaced by ‖A‖F ,they showed that the theory remains valid.(They proved results for other norms too.)Consequently:

Bob-05 – p.54/69

Page 55: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

A useful variant:

ηα,β(xk) ≡ ‖b − Axk‖2

β‖b‖2 + α‖A‖F‖xk‖2

= minη,δA,δb

{η : (A + δA) xk = b + δb,

‖δA‖F ≤ ηα‖A‖F , ‖δb‖2 ≤ ηβ‖b‖2}.

This gives the directly applicable NRBE’ criterionbased on the Frobenius matrix norm.

Bob-05 – p.55/69

Page 56: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

MGS-GMRES for Ax = b, A ∈ Rn×n.

GMRES: “Generalized Minimum Residual”algorithm to solve Ax = b, A ∈ R

n×n, nonsing.

Y. SAAD & M. H. SCHULTZ, SIAM J. Sci. Statist.Comput., 7 (1986), pp. 856–869.

Based on the algorithm by W. ARNOLDI,Quart. Appl. Math., 9 (1951), pp. 17–29.

Bob-05 – p.56/69

Page 57: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

MGS-GMRES for Ax = b, A ∈ Rn×n.

GMRES: “Generalized Minimum Residual”algorithm to solve Ax = b, A ∈ R

n×n, nonsing.

Y. SAAD & M. H. SCHULTZ, SIAM J. Sci. Statist.Comput., 7 (1986), pp. 856–869.

Based on the algorithm by W. ARNOLDI,Quart. Appl. Math., 9 (1951), pp. 17–29.

The Modified Gram-Schmidt version (MGS-GMRES)is efficient, but looses orthogonality.

Some practitioners avoid it, or use reorthogonalization(e.g. Matlab). Is this necessary?

Bob-05 – p.57/69

Page 58: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

MGS-GMRES for Ax = b, A ∈ Rn×n.

Take % ≡ ‖b‖2, v1 ≡ b/%; generate columns ofVj+1 ≡ [v1, . . . , vj+1] via the (MGS) Arnoldi alg.:

AVj = Vj+1Hj+1,j, V Tj+1Vj+1 = Ij+1. ∗

Approximate solution xj ≡ Vjyj has residual

rj ≡ b − Axj = b − AVjyj

= v1%−Vj+1Hj+1,jyj = Vj+1(e1%−Hj+1,jyj).

The minimum residual is found by taking

yj ≡ arg miny

{‖b−AVjy‖2 = ‖e1%−Hj+1,jy‖2}. ∗

* DIFFICULTY: Computed V Tj+1Vj+1 6= Ij+1.

Bob-05 – p.58/69

Page 59: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Stability of MGS-GMRESFor some k≤n, the MGS–GMRES method isbackward stable for computing a solution xk to

Ax = b, A ∈ Rn×n, σmin(A) � n2ε‖A‖F ;

Bob-05 – p.59/69

Page 60: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Stability of MGS-GMRESFor some k≤n, the MGS–GMRES method isbackward stable for computing a solution xk to

Ax = b, A ∈ Rn×n, σmin(A) � n2ε‖A‖F ;

as well as intermediate solutions yj to the LLSPs:min

y‖b − AVjy‖2, j = 1, . . . , k,

where xj ≡ fl(Vj yj).

Bob-05 – p.60/69

Page 61: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Stability of MGS-GMRESFor some k≤n, the MGS–GMRES method isbackward stable for computing a solution xk to

Ax = b, A ∈ Rn×n, σmin(A) � n2ε‖A‖F ;

as well as intermediate solutions yj to the LLSPs:min

y‖b − AVjy‖2, j = 1, . . . , k,

where xj ≡ fl(Vj yj).

“Modified Gram-Schmidt (MGS), Least Squares,and backward stability of MGS-GMRES”C. C. Paige, M. Rozložník, and Z. Strakoš,Vol. 28, No. 1, 2006, pp. 264-284.

Bob-05 – p.61/69

Page 62: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Stability of MGS-GMRES, ctd.For some k≤n, the MGS–GMRES method isbackward stable for computing a solution xk to

Ax = b, A ∈ Rn×n, σmin(A) � n2ε‖A‖F ,

in that for some step k ≤ n,and some reasonable constant c,the computed solution xk satisfies

(A + δAk) xk = b + δbk,

‖δAk‖F ≤ cknε‖A‖F , ‖δbk‖2 ≤ cknε‖b‖2.

So we can use the F-norm NRBE’ stopping criterion!

Bob-05 – p.62/69

Page 63: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

( this is “rigorous”, but unnecessarily restrictive,a more practical requirement might be:

for large n, σmin(A) ≥ 10 n ε‖A‖F )

Bob-05 – p.63/69

Page 64: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

Bob-05 – p.64/69

Page 65: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

• without reorthogonalization,

Bob-05 – p.65/69

Page 66: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

• without reorthogonalization,• but with the optimal NRBE’ stopping criterion,

Bob-05 – p.66/69

Page 67: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

• without reorthogonalization,• but with the optimal NRBE’ stopping criterion,• since MGS-GMRES for Ax = b

is a backward stable iterative method.

Bob-05 – p.67/69

Page 68: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

• without reorthogonalization,• but with the optimal NRBE’ stopping criterion,• since MGS-GMRES for Ax = b

is a backward stable iterative method.

&%'$

Bob-05 – p.68/69

Page 69: Solving large sparse Ax b - cs.mcgill.cachris/pub/PaiRSTalkNRBE06.pdf · Meurant), but references are given in van der Vorst. Deuhard & Hohmann, §2.4.3, do introduce the topic.

Conclusions. Solving Ax = b .For a sufficiently nonsingular matrix, e.g.

σmin(A) � n2ε‖A‖F ,

• we can happily use the efficient variantMGS-GMRES of the GMRES method,

• without reorthogonalization,• but with the optimal NRBE’ stopping criterion,• since MGS-GMRES for Ax = b

is a backward stable iterative method.

&%'$

��Bob-05 – p.69/69