CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL...

14
CG VERSUS MINRES: AN EMPIRICAL COMPARISON * DAVID CHIN-LUNG FONG AND MICHAEL SAUNDERS Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimum residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of approximate solutions x k generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. In particular, we show for MINRES that the solution norms kx k k are monotonically increasing when A is positive definite (as was already known for CG), and the solution errors kx * - x k k are monotonically decreasing. We also show that the backward errors for the MINRES iterates x k are monotonically decreasing. Key words. conjugate gradient method, minimum residual method, iterative method, sparse matrix, linear equations, CG, CR, MINRES, Krylov subspace method, trust-region method 1. Introduction. The conjugate gradient method (CG) [9] and the minimum residual method (MINRES) [16] are both Krylov subspace methods for the iterative solution of symmetric linear equations Ax = b. CG is commonly used when the matrix A is positive definite, while MINRES is generally reserved for indefinite systems [24, p85]. We reexamine this wisdom from the point of view of early termination on positive-definite systems. We assume that the system Ax = b is real with A symmetric positive definite (spd) and of dimension n × n. The Lanczos process [11] with starting vector b may be used to generate the n × k matrix V k ( v 1 v 2 ... v k ) and the (k + 1) × k Hessenberg tridiagonal matrix T k such that AV k = V k+1 T k for k =1, 2,...,‘ and AV = V T for some n, where the columns of V k form a theoretically orthonormal basis for the kth Krylov subspace K k (A, b) span{b, Ab, A 2 b,...,A k-1 b}, and T is × and tridiagonal. Approximate solutions within the kth Krylov subspace may be formed as x k = V k y k for some k-vector y k . As shown in [16], three iterative methods CG, MINRES, and SYMMLQ may be derived by choosing y k appropriately at each iteration. CG is well defined if A is spd, while MINRES and SYMMLQ are stable for any symmetric nonsingular A. As noted by Choi [2], SYMMLQ can form an approximation x k+1 = V k+1 y k+1 in the (k +1)th Krylov subspace when CG and MINRES are forming their approximations x k = V k y k in the kth subspace. It would be of future interest to compare all three methods on spd systems, but for the remainder of this paper we focus on CG and MINRES. With different methods using the same information V k+1 and T k to compute solu- tion estimates x k = V k y k within the same Krylov subspace (for each k), it is commonly thought that the number of iterations required will be similar for each method, and hence CG should be preferable on spd systems because it requires somewhat fewer floating-point operations per iteration. This view is justified if an accurate solution is required (stopping tolerance τ close to machine precision ). We show that with looser stopping tolerances, MINRES is sure to terminate sooner than CG when the * Report SOL 2011-2. Submitted October 19, 2011 to SQU Journal for Science. ICME, Stanford University, CA 94305-4042, USA ([email protected]). Partially supported by a Stanford Graduate Fellowship. Systems Optimization Laboratory, Department of Management Science and Engineering, Stan- ford University, CA 94305-4026, USA ([email protected]). Partially supported by Office of Naval Research grant N00014-08-1-0191 and by the U.S. Army Research Laboratory, through the Army High Performance Computing Research Center, Cooperative Agreement W911NF-07-0027. 1

Transcript of CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL...

Page 1: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES: AN EMPIRICAL COMPARISON∗

DAVID CHIN-LUNG FONG† AND MICHAEL SAUNDERS‡

Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method(CG) is commonly used when A is positive definite, while the minimum residual method (MINRES)is typically reserved for indefinite systems. We investigate the sequence of approximate solutions xkgenerated by each method and suggest that even if A is positive definite, MINRES may be preferableto CG if iterations are to be terminated early. In particular, we show for MINRES that the solutionnorms ‖xk‖ are monotonically increasing when A is positive definite (as was already known for CG),and the solution errors ‖x∗ − xk‖ are monotonically decreasing. We also show that the backwarderrors for the MINRES iterates xk are monotonically decreasing.

Key words. conjugate gradient method, minimum residual method, iterative method, sparsematrix, linear equations, CG, CR, MINRES, Krylov subspace method, trust-region method

1. Introduction. The conjugate gradient method (CG) [9] and the minimumresidual method (MINRES) [16] are both Krylov subspace methods for the iterativesolution of symmetric linear equations Ax = b. CG is commonly used when the matrixA is positive definite, while MINRES is generally reserved for indefinite systems [24,p85]. We reexamine this wisdom from the point of view of early termination onpositive-definite systems.

We assume that the system Ax = b is real with A symmetric positive definite(spd) and of dimension n × n. The Lanczos process [11] with starting vector b maybe used to generate the n × k matrix Vk ≡

(v1 v2 . . . vk

)and the (k + 1) × k

Hessenberg tridiagonal matrix Tk such that AVk = Vk+1Tk for k = 1, 2, . . . , ` andAV` = V`T` for some ` ≤ n, where the columns of Vk form a theoretically orthonormalbasis for the kth Krylov subspace Kk(A, b) ≡ span{b, Ab,A2b, . . . , Ak−1b}, and T` is`× ` and tridiagonal. Approximate solutions within the kth Krylov subspace may beformed as xk = Vkyk for some k-vector yk. As shown in [16], three iterative methodsCG, MINRES, and SYMMLQ may be derived by choosing yk appropriately at eachiteration. CG is well defined if A is spd, while MINRES and SYMMLQ are stable forany symmetric nonsingular A.

As noted by Choi [2], SYMMLQ can form an approximation xk+1 = Vk+1yk+1 inthe (k+1)th Krylov subspace when CG and MINRES are forming their approximationsxk = Vkyk in the kth subspace. It would be of future interest to compare all threemethods on spd systems, but for the remainder of this paper we focus on CG andMINRES.

With different methods using the same information Vk+1 and Tk to compute solu-tion estimates xk = Vkyk within the same Krylov subspace (for each k), it is commonlythought that the number of iterations required will be similar for each method, andhence CG should be preferable on spd systems because it requires somewhat fewerfloating-point operations per iteration. This view is justified if an accurate solutionis required (stopping tolerance τ close to machine precision ε). We show that withlooser stopping tolerances, MINRES is sure to terminate sooner than CG when the

∗Report SOL 2011-2. Submitted October 19, 2011 to SQU Journal for Science.†ICME, Stanford University, CA 94305-4042, USA ([email protected]). Partially supported by

a Stanford Graduate Fellowship.‡Systems Optimization Laboratory, Department of Management Science and Engineering, Stan-

ford University, CA 94305-4026, USA ([email protected]). Partially supported by Office ofNaval Research grant N00014-08-1-0191 and by the U.S. Army Research Laboratory, through theArmy High Performance Computing Research Center, Cooperative Agreement W911NF-07-0027.

1

Page 2: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

2 DAVID FONG AND MICHAEL SAUNDERS

stopping rule is based on the backward error for xk, and by numerical examples weillustrate that the difference in iteration numbers can be substantial.

1.1. Notation. We study the application of CG and MINRES to real symmetricpositive-definite (spd) systems Ax = b. The unique solution is denoted by x∗. Theinitial approximate solution is x0 ≡ 0, and rk ≡ b−Axk is the residual vector for anapproximation xk within the kth Krylov subspace. For a vector v and matrix A, ‖v‖and ‖A‖ denote the 2-norm and the Frobenius norm respectively, and A � 0 indicatesthat A is spd.

2. Minimization properties of Krylov subspace methods. With exactarithmetic, the Lanczos process terminates with k = ` for some ` ≤ n. To ensurethat the approximations xk = Vkyk improve by some measure as k increases toward`, the Krylov solvers minimize some convex function within the expanding Krylovsubspaces [8].

2.1. CG. When A is spd, the quadratic form φ(x) ≡ 12x

TAx − bTx is boundedbelow, and its unique minimizer solves Ax = b. A characterization of the CG iterationsis that they minimize the quadratic form within each Krylov subspace [8], [15, §2.4],[25, §§8.8–8.9]:

xCk = VkyCk , where yCk = arg min

yφ(Vky).

With b = Ax∗ and 2φ(xk) = xTkAxk − 2xTkAx∗, this is equivalent to minimizing the

function ‖x∗− xk‖A ≡ (x∗− xk)TA(x∗− xk), known as the energy norm of the error,within each Krylov subspace. For some applications, this is a desirable property[19, 22, 1, 15, 25].

2.2. MINRES. For nonsingular (and possibly indefinite) systems, the residualnorm was used in [16] to characterize the MINRES iterations:

xMk = VkyMk , where yMk = arg min

y‖b−AVky‖. (2.1)

Thus, MINRES minimizes ‖rk‖ within the kth Krylov subspace. This was also an aimof Stiefel’s Conjugate Residual method (CR) [21] for spd systems (and of Luenberger’sextensions of CR to indefinite systems [13, 14]). Thus, CR and MINRES must generatethe same iterates on spd systems. We use this connection to prove that ‖xk‖ increasesmonotonically when MINRES is applied to an spd system.

2.3. CG and CR. The two methods for solving spd systems Ax = b are sum-marized in Table 2.1. The first two columns are pseudocodes for CG and CR withiteration number k omitted for clarity; they match our Matlab implementations.Note that q = Ap in both methods, but it is not computed as such in CR. Termina-tion occurs when r = 0 (⇒ ρ = β = 0).

To prove our main result we need to introduce iteration indices; see column 3 ofTable 2.1. Termination occurs when rk = 0 for some index k = ` ≤ n (⇒ ρ` = β` = 0,r` = s` = p` = q` = 0). Note: This ` is the same as the ` at which the Lanczosprocess theoretically terminates for the given A and b.

Theorem 2.1. The following properties holds for Algorithm CR:(a) qTi qj = 0 (i 6= j)(b) rTi qj = 0 (i ≥ j + 1)

Page 3: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 3

Table 2.1Pseudocode for algorithms CG and CR

CGInitializex = 0, r = bρ = rTr, p = rRepeatq = Apα = ρ/pTqx← x+ αpr ← r − αq

ρ = ρ, ρ = rTrβ = ρ/ρp← r + βp

CRInitializex = 0, r = b, s = Ar,ρ = rTs, p = r, q = sRepeat

(q = Ap)α = ρ/‖q‖2x← x+ αpr ← r − αqs = Arρ = ρ, ρ = rTsβ = ρ/ρp← r + βpq ← s+ βq

CRInitializex0 = 0, r0 = b, s0 = Ar0,ρ0 = rT0s0, p0 = r0, q0 = s0For k = 1, 2, . . .

(qk−1 = Apk−1)αk = ρk−1/‖qk−1‖2xk = xk−1 + αkpk−1rk = rk−1 − αkqk−1sk = Arkρk = rTkskβk = ρk/ρk−1pk = rk + βkpk−1qk = sk + βkqk−1

Proof. Given in [14, Theorem 1].

Theorem 2.2. The following properties holds for Algorithm CR:

(a) αi ≥ 0(b) βi ≥ 0(c) pTi qj ≥ 0(d) pTi pj ≥ 0(e) xTi pj ≥ 0(f) rTi pj ≥ 0

Proof.

(a) Here we use the fact that A is spd. The inequalities are strict until i = ` (andr` = 0).

ρi = rTi si = rTiAri ≥ 0 (A � 0) (2.2)

αi = ρi−1/‖qi−1‖2 ≥ 0

(b) And again:

βi = ρi/ρi−1 ≥ 0 (by (2.2))

(c) Case I: i = j

pTi qi = pTi Api ≥ 0 (A � 0)

Case II: i− j = k > 0

pTi qj = pTi qi−k = rTi qi−k + βipTi−1qi−k

= βipTi−1qi−k (by Thm 2.1 (b))

≥ 0,

where βi ≥ 0 by (b) and pTi−1qi−k ≥ 0 by induction as (i−1)−(i−k) = k−1 < k.

Page 4: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

4 DAVID FONG AND MICHAEL SAUNDERS

Case III: j − i = k > 0

pTi qj = pTi qi+k = pTi Api+k

= pTi A(ri+k + βi+kpi+k−1)

= qTi ri+k + βi+kpTi qi+k−1

= βi+kpTi qi+k−1 (by Thm 2.1 (b))

≥ 0,

where βi+k ≥ 0 by (b) and pTi qi+k−1 ≥ 0 by induction as (i+k−1)−i = k−1 < k.(d) At termination, define P ≡ span{p0, p1, . . . , p`−1} and Q ≡ span{q0, . . . , q`−1}.

By construction, P = span{b, Ab, . . . , A`−1b} and Q = span{Ab, . . . , A`b} (sinceqi = Api). Again by construction, x` ∈ P, and since r` = 0 we have b =Ax` ⇒ b ∈ Q. We see that P ⊆ Q. By Theorem 2.1(a), {qi/‖qi‖}`−1i=0 forms anorthonormal basis for Q. If we project pi ∈ P ⊆ Q onto this basis, we have

pi =

`−1∑k=0

pTi qkqTk qk

qk,

where all coordinates are non-negative from (c). Similarly for any other pj , j < `.Therefore pTi pj ≥ 0 for any i, j < `.

(e) By construction,

xi = xi−1 + αipi−1 = · · · =i∑

k=1

αkpk−1 (x0 = 0)

Therefore xTi pi ≥ 0 by (d) and (a).(f) Note that any ri can be expressed as a sum of qi:

ri = ri+1 + αi+1qi

= · · ·= rl + αlql−1 + · · ·+ αi+1qi

= αlql−1 + · · ·+ αi+1qi.

Thus we have

rTi pj = (αlql−1 + · · ·+ αi+1qi)T pj ≥ 0,

where the inequality follows from (a) and (c).

We are now able to prove our main theorem about the monotonic increase of ‖xk‖for CR and MINRES. A similar result was proved for CG by Steihaug [19]. We knowthat ‖xk‖ is also monotonic for LSQR [17].

Theorem 2.3. For CR (and hence MINRES) on an spd system Ax = b, ‖xk‖increases monotonically.

Proof. ‖xi‖2 − ‖xi−1‖2 = 2αixTi−1pi−1 + pTi−1pi−1 ≥ 0, where the last inequality

follows from Theorem 2.2 (a), (d) and (e). Therefore ‖xi‖ ≥ ‖xi−1‖.

Theorem 2.4. For CR (and hence MINRES) on an spd system Ax = b, the error‖x∗ − xk‖ decreases monotonically.

Page 5: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 5

Proof. From the update rule for xk, we can express the final solution xl = x∗ as

xl = xl−1 + αl−1pl−1

= · · ·= xk + αk+1pk + · · ·+ αl−1pl−1

= xk−1 + αkpk−1 + αk+1pk + · · ·+ αl−1pl−1.

Using the last two equalities above, we can write

‖xl − xk−1‖2 − ‖xl − xk‖2 = (xl − xk−1)T (xl − xk−1)− (xl − xk)T (xl − xk)

= 2αkpTk−1(αk+1pk + · · ·+ αl−1pl−1) + α2

kpTk−1pk−1

≥ 0,

where the last inequality follows from Theorem 2.2 (a), (d).

3. Backward error analysis. For many physical problems requiring numericalsolution, we are given inexact or uncertain input data (in this case A and/or b). Itis not justifiable to seek a solution beyond the accuracy of the data [6]. Instead, it ismore reasonable to stop an iterative solver once we know that the current approximatesolution solves a nearby problem. The measure of “nearby” should match the errorin the input data. The design of such stopping rules is an important application ofbackward error analysis.

For a consistent linear system Ax = b, we think of xk coming from the kthiteration of one of the iterative solvers. Following Titley-Peloquin [23] we say that xkis an acceptable solution if and only if there exist perturbations E and f satisfying

(A+ E)xk = b+ f,‖E‖‖A‖

≤ α, ‖f‖‖b‖≤ β (3.1)

for some tolerances α ≥ 0, β ≥ 0 that reflect the (preferably known) accuracy of thedata. We are naturally interested in minimizing the size of E and f . If we define theoptimization problem

minξ,E,f

ξ s.t. (A+ E)xk = b+ f,‖E‖‖A‖

≤ αξ, ‖f‖‖b‖≤ βξ

to have optimal solution ξk, Ek, fk (all functions of xk, α, and β), we see that xk is anacceptable solution if and only if ξk ≤ 1. We call ξk the normwise relative backwarderror (NRBE) for xk.

With rk = b−Axk, the optimal solution ξk, Ek, fk is shown in [23] to be

φk =β‖b‖

α‖A‖‖xk‖+ β‖b‖, Ek =

(1− φk)

‖xk‖2rkx

Tk , (3.2)

ξk =‖rk‖

α‖A‖‖xk‖+ β‖b‖, fk = −φkrk. (3.3)

(See [10, p12] for the case β = 0 and [10, §7.1 and p336] for the case α = β.)

3.1. Stopping rule. For general tolerances α and β, the condition ξk ≤ 1 forxk to be an acceptable solution becomes

‖rk‖ ≤ α‖A‖‖xk‖+ β‖b‖, (3.4)

the stopping rule used in LSQR for consistent systems [17, p54, rule S1].

Page 6: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

6 DAVID FONG AND MICHAEL SAUNDERS

3.2. Monotonic backward errors. Of interest is the size of the perturbationsto A and b for which xk is an exact solution of Ax = b. From (3.2)–(3.3), theperturbations have the following norms:

‖Ek‖ = (1− φk)‖rk‖‖xk‖

=α‖A‖‖rk‖

α‖A‖‖xk‖+ β‖b‖, (3.5)

‖fk‖ = φk‖rk‖ =β‖b‖‖rk‖

α‖A‖‖xk‖+ β‖b‖. (3.6)

Since ‖xk‖ is monotonically increasing for CG and MINRES, we see from (3.2) that φkis monotonically decreasing for both solvers. Since ‖rk‖ is monotonically decreasingfor MINRES (but not for CG), we have the following result.

Theorem 3.1. Suppose α > 0 and β > 0 in (3.1). For CR and MINRES (but notCG), the relative backward errors ‖Ek‖/‖A‖ and ‖fk‖/‖b‖ decrease monotonically.

Proof. This follows from (3.5)–(3.6) with ‖xk‖ increasing for both solvers and‖rk‖ decreasing for CR and MINRES but not for CG.

4. Numerical results. Here we compare the convergence of CG and MINRES

on various spd systems Ax = b and some associated indefinite systems (A− δI)x = b.The test examples are drawn from the University of Florida Sparse Matrix Collection(Davis [5]). We experimented with all 26 cases for which A is real spd and b issupplied.

Since A is spd, we apply diagonal preconditioning by redefining A and b as follows:d = diag(A), D = diag(1./sqrt(d)), A ← DAD, b ← Db, b ← b/‖b‖. Thus in thefigures below we have diag(A) = I and ‖b‖ = 1.

The stopping rule used for CG and MINRES was (3.4) with α = 0 and β = 10−8

(that is, ‖rk‖ ≤ 10−8‖b‖ = 10−8), but with a maximum of n iterations.

4.1. Positive-definite systems. In defining backward errors, we assume forsimplicity that α > 0 and β = 0 in (3.1)–(3.3), even though it doesn’t match thechoice of α and β in the stopping rule (3.4). This gives φk = 0 and ‖Ek‖ = ‖rk‖/‖xk‖in (3.5). Thus, as in Theorem 3.1, we expect ‖Ek‖ to decrease monotonically for CR

and MINRES but not for CG.Figure 4.1 shows the backward errors for four representative examples. We see

that the ‖Ek‖ values converge smoothly for MINRES while they fluctuate for CG. Forevery k, the MINRES backward error is smaller than that of CG, so that MINRES canalways be stopped earlier than CG. These figures illustrate our belief that MINRES

should be considered for spd systems when high accuracy is not required.Figure 4.2 shows ‖rk‖ and ‖xk‖ for CG and MINRES on two typical spd examples.

We see that ‖xk‖ is monotonically increasing for both solvers, and the ‖xk‖ valuesrise fairly rapidly to their limiting value ‖x‖, with a moderate delay for MINRES.

Figure 4.3 shows ‖rk‖ and ‖xk‖ for CG and MINRES on two spd examples inwhich the residual decrease and the solution norm increase are somewhat slower thantypical. The rise of ‖xk‖ for MINRES is rather more delayed. In the second case, ifthe stopping tolerance were β = 10−6 rather than β = 10−8, the final MINRES ‖xk‖(k ≈ 10000) would be less than half the exact value ‖x∗‖. It will be of future interestto evaluate this effect within the context of trust-region methods for optimization.

Page 7: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 7

0 1 2 3 4 5 6 7 8 9 10

x 104

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

iteration count

log(

||r||/

||x||)

Name:Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7

CGMINRES

0 50 100 150 200 250 300 350 400 450−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log(

||r||/

||x||)

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

0 200 400 600 800 1000 1200 1400−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

iteration count

log(

||r||/

||x||)

Name:Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log(

||r||/

||x||)

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

Fig. 4.1. Comparison of backward errors for CG and MINRES solving four spd systemsAx = b with n = 19779, 4098, 504855, and 245874. The values of log10(‖rk‖/‖xk‖) are plottedagainst iteration number k. These values define log10(‖Ek‖) when the stopping tolerances in (3.4)are α > 0 and β = 0.

Upper left: The MINRES backward error converges monotonically while the CG backwarderror oscillates toward convergence. Upper right: CG and MINRES both converge reasonablywell, with MINRES staying ahead by 20–50 iterations.

Lower left: CG converges at the same speed as MINRES until near convergence, but slowsdown and takes twice as many iterations as MINRES to converge. Lower right: MINRES staysahead of CG by two orders of magnitude during most iterations.

Page 8: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

8 DAVID FONG AND MICHAEL SAUNDERS

0 0.5 1 1.5 2 2.5 3

x 104

−10

−8

−6

−4

−2

0

2

iteration count

log|

|r||

Name:Simon_olafu, Dim:16146x16146, nnz:1015156, id=6

CGMINRES

0 50 100 150 200 250 300 350 400 450−9

−8

−7

−6

−5

−4

−3

−2

−1

0

1

iteration count

log|

|r||

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

0 0.5 1 1.5 2 2.5 3

x 104

0

2

4

6

8

10

12

14

16

18x 10

5

iteration count

||x||

Name:Simon_olafu, Dim:16146x16146, nnz:1015156, id=6

CGMINRES

0 50 100 150 200 250 300 350 400 4500

5

10

15

20

25

iteration count

||x||

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

Fig. 4.2. Comparison of residual and solution norms for CG and MINRES solving two spdsystems Ax = b with n = 16146 and 4098. These are typical examples.

Top: The values of log10 ‖rk‖ are plotted against iteration number k. Bottom: The values of‖xk‖ are plotted against k. The solution norms grow somewhat faster for CG than for MINRES.Both reach the limiting value ‖x‖ significantly before xk is close to x.

Page 9: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 9

0 200 400 600 800 1000 1200 1400−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log|

|r||

Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id=14

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log|

|r||

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

0 200 400 600 800 1000 1200 14000

5

10

15

20

25

iteration count

||x||

Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id=14

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

10

20

30

40

50

60

70

80

iteration count

||x||

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

Fig. 4.3. Comparison of residual and solution norms for CG and MINRES solving two spdsystems Ax = b with n = 82654 and 245874. Sometimes the solution norms take longer to reach thelimiting value ‖x‖.

Top: The values of log10 ‖rk‖ are plotted against iteration number k. Bottom: The values of‖xk‖ are plotted against k. Again the solution norms grow faster for CG.

Page 10: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

10 DAVID FONG AND MICHAEL SAUNDERS

1 1.5 2 2.5 30.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Iteration number

||xk||

1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

Iteration number

||rk||

/ ||x

k||

Fig. 4.4. For MINRES on the indefinite problem (4.1), ‖xk‖ and the backward error ‖rk‖/‖xk‖are both slightly non-monotonic.

4.2. Indefinite systems. A key part of Steihaug’s trust-region method forlarge-scale unconstrained optimization [19] (see also [4]) is his proof that when CG

is applied to a symmetric (possibly indefinite) system Ax = b, the solution norms‖x1‖, . . . , ‖xk‖ are strictly increasing as long as pTjApj > 0 for all iterations 1 ≤ j ≤ k.(We are using the notation in Table 2.1.)

From our proof of Theorem 2.2, we see that the same property holds for CR andMINRES as long as both pTjApj > 0 and rTjArj > 0 for all iterations 1 ≤ j ≤ k. Incase future research finds that MINRES is a useful solver in the trust-region context,it is of interest now to offer some empirical results about the behavior of ‖xk‖ whenMINRES is applied to indefinite systems.

First, on the nonsingular indefinite system2 1 11 0 11 1 2

x =

011

, (4.1)

MINRES gives non-monotonic solution norms, as shown in the left plot of Figure 4.4.The decrease in ‖xk‖ implies that the backward errors ‖rk‖/‖xk‖ may not be mono-tonic, as illustrated in the right plot.

More generally, we can gain an impression of the behavior of ‖xk‖ by recallingfrom Choi et al. [3] the connection between MINRES and MINRES-QLP. Both methodscompute the iterates xMk = Vky

Mk in (2.1) from the subproblems

yMk = arg miny∈Rk

‖Tky − β1e1‖ and possibly T`yM` = β1e1.

When A is nonsingular or Ax = b is consistent (which we now assume), yMk is uniquelydefined for each k ≤ ` and the methods compute the same iterates xMk (but by differentnumerical methods). In fact they both compute the expanding QR factorizations

Qk[Tk β1e1

]=

[Rk tk0 φk

],

(with Rk upper tridiagonal) and MINRES-QLP also computes the orthogonal factor-izations RkPk = Lk (with Lk lower tridiagonal), from which the kth solution estimateis defined by Wk = VkPk, Lkuk = tk, and xMk = Wkuk. As shown in [3, §5.3], the

Page 11: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 11

0 2000 4000 6000 8000 10000 12000 14000 16000 18000−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0Simon_olafu, Dim:16146x16146, nnz:1015156, id=32

iteration count

log|

|r||

0 500 1000 1500 2000 2500 3000 3500 4000 4500−2.5

−2

−1.5

−1

−0.5

0Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=39

iteration count

log|

|r||

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1

2

3

4

5

6

7

8

9Simon_olafu, Dim:16146x16146, nnz:1015156, id=32

iteration count

||x||

0 500 1000 1500 2000 2500 3000 3500 4000 45000

5

10

15Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=39

iteration count

||x||

Fig. 4.5. Residual norms and solution norms when MINRES is applied to two indefinitesystems (A − δI)x = b, where A is the spd matrices used in Figure 4.2 (n = 16146 and 4098) andδ = 0.5 is large enough to make the systems indefinite.

Top: The values of log10 ‖rk‖ are plotted against iteration number k for the first n iterations.Bottom left: The values of ‖xk‖ are plotted against k. During the n = 16146 iterations, ‖xk‖

increased 83% of the time and the backward errors ‖rk‖/‖xk‖ (not shown here) decreased 96% ofthe time.

Bottom right: During the n = 4098 iterations, ‖xk‖ increased 90% of the time and the back-ward errors ‖rk‖/‖xk‖ (not shown here) decreased 98% of the time.

construction of these quantities is such that the first k − 3 columns of Wk are thesame as in Wk−1, and the first k − 3 elements of uk are the same as in uk−1. SinceWk has orthonormal columns, ‖xMk ‖ = ‖uk‖, where the first k − 2 elements of uk areunaltered by later iterations. As shown in [3, §6.5], it means that certain quantitiescan be cheaply updated to give norm estimates in the form

χ2 ← χ2 + µ2k−2, ‖xMk ‖2 = χ2 + µ2

k−1 + µ2k,

where it is clear that χ2 increases monotonically. Although the last two terms are ofunpredictable size, ‖xMk ‖2 tends to be dominated by the monotonic term χ2 and wecan expect that ‖xMk ‖ will be approximately monotonic as k increases from 1 to `.

Experimentally we find that for most MINRES iterations on an indefinite problem,‖xk‖ does increase. To obtain indefinite examples that were sensibly scaled, we usedthe four spd (A, b) cases in Figures 4.2–4.3, applied diagonal scaling as before, andsolved (A − δI)x = b with δ = 0.5 and where A and b are now scaled (so that

Page 12: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

12 DAVID FONG AND MICHAEL SAUNDERS

0 1 2 3 4 5 6 7 8 9

x 104

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0Schmid_thermal1, Dim:82654x82654, nnz:574458, id=40

iteration count

log|

|r||

0 0.5 1 1.5 2 2.5

x 105

−3

−2.5

−2

−1.5

−1

−0.5

0BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=48

iteration count

log|

|r||

0 1 2 3 4 5 6 7 8 9

x 104

0

50

100

150

200

250

300

350

400Schmid_thermal1, Dim:82654x82654, nnz:574458, id=40

iteration count

||x||

0 0.5 1 1.5 2 2.5

x 105

0

50

100

150

200

250

300

350

400BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=48

iteration count

||x||

Fig. 4.6. Residual norms and solution norms when MINRES is applied to two indefinitesystems (A− δI)x = b, where A is the spd matrices used in Figure 4.3 (n = 82654 and 245874) andδ = 0.5 is large enough to make the systems indefinite.

Top: The values of log10 ‖rk‖ are plotted against iteration number k for the first n iterations.Bottom left: The values of ‖xk‖ are plotted against k. There is a mild but clear decrease in

‖xk‖ over an interval of about 10000 iterations. During the n = 82654 iterations, ‖xk‖ increased83% of the time and the backward errors ‖rk‖/‖xk‖ (not shown here) decreased 91% of the time.

Bottom right: The solution norms and backward errors are essentially monotonic. Duringthe n = 245874 iterations, ‖xk‖ increased 88% of the time and the backward errors ‖rk‖/‖xk‖ (notshown here) decreased 95% of the time.

diag(A) = I). The number of iterations increased significantly but was limited to n.Figure 4.5 shows log10 ‖rk‖ and ‖xk‖ for the first two cases (where A is the spd

matrices in Figure 4.2). The values of ‖xk‖ are essentially monotonic. The backwarderrors ‖rk‖/‖xk‖ (not shown) were even closer to being monotonic.

Figure 4.6 shows ‖xk‖ and log10 ‖rk‖ for the second two cases (where A is the spdmatrices in Figure 4.3). The left example reveals a definite period of decrease in ‖xk‖.Nevertheless, during the n = 82654 iterations, ‖xk‖ increased 83% of the time and thebackward errors ‖rk‖/‖xk‖ decreased 91% of the time. The right example is more likethose in Figure 4.5. During n = 245874 iterations, ‖xk‖ increased 83% of the time,the backward errors ‖rk‖/‖xk‖ decreased 91% of the time, and any nonmonotonicitywas very slight.

Page 13: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

CG VERSUS MINRES 13

5. Conclusions. For full-rank least-squares problems min ‖Ax− b‖, the solversLSQR [17, 18] and LSMR [7, 12] are equivalent to CG and MINRES on the (spd) normalequation ATAx = ATb. Comparisons in [7] already indicated that LSMR can oftenstop much sooner than LSQR when Stewart’s backward error norm for least-squaresproblems (‖ATrk‖/‖rk‖ [20]) is used in both cases. Our theoretical and experimentalresults here provide analogous evidence that MINRES can often stop much soonerthan CG on spd systems when the stopping rule is based on backward error norms‖rk‖/‖xk‖ (or the more general norms in (3.5)–(3.6)).

By definition, MINRES minimizes ‖rk‖ in the kth Krylov subspace, so that ‖rk‖decreases monotonically. Theorem 2.2 shows that MINRES shares a known propertyof CG: that ‖xk‖ increases monotonically when A is spd. This implies that ‖xk‖ ismonotonic for LSMR (as conjectured in [7]), and suggests that MINRES may be a usefulalternative to CG in the context of trust-region methods for optimization. Finally,while the CG energy norms of the errors (‖x∗ − xk‖A) are known to be monotonic,Theorem 2.4 shows that for MINRES on spd systems the error norms ‖x∗ − xk‖ aremonotonic.

Acknowledgements. We kindly thank Professor Mehiddin Al-Baali and othercolleagues for organizing the Second International Conference on Numerical Analysisand Optimization at Sultan Qaboos University (January 3–6, 2011, Muscat, Sultanateof Oman). Their wish to publish some of the conference papers in a special issue ofSQU Journal for Science gave added motivation for this research. We also thankDr Sou-Cheng Choi for many helpful discussions of the iterative solvers, and MichaelFriedlander for a final welcome tip.

REFERENCES

[1] M. Arioli, A stopping criterion for the conjugate gradient algorithm in a finite element methodframework, Numer. Math., 97 (2004), pp. 1–24.

[2] Sou-Cheng Choi, Iterative Methods for Singular Linear Equations and Least-Squares Prob-lems, PhD thesis, Stanford University, Stanford, CA, December 2006.

[3] S.-C. Choi, C. C. Paige, and M. A. Saunders, MINRES-QLP: a Krylov subspace method forindefinite or singular symmetric systems, SIAM J. Sci. Comput., 33 (2011), pp. 1810–1836.

[4] A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-region Methods, vol. 1, SIAM, Philadel-phia, 2000.

[5] T. A. Davis, University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/

research/sparse/matrices.[6] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users’ Guide,

SIAM, Philadelphia, 1979.[7] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm for sparse least-squares

problems, SIAM J. Sci. Comput., to appear (2011).[8] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative solution of linear systems,

Acta Numerica, 1 (1992), pp. 57–100.[9] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems,

J. Res. Nat. Bur. Standards, 49 (1952), pp. 409–436.[10] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, sec-

ond ed., 2002.[11] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential

and integral operators, J. Res. Nat. Bur. Standards, 45 (1950), pp. 255–282.[12] LSMR software for linear systems and least squares. http://www.stanford.edu/group/SOL/

software.html.[13] D. G. Luenberger, Hyperbolic pairs in the method of conjugate gradients, SIAM J. Appl.

Math., 17 (1969), pp. 1263–1267.[14] , The conjugate residual method for constrained minimization problems, SIAM J. Numer.

Anal., 7 (1970), pp. 390–398.[15] G. A. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite

Page 14: CG VERSUS MINRES: AN EMPIRICAL COMPARISON · 2011-10-26 · CG VERSUS MINRES: AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONGyAND MICHAEL SAUNDERSz Abstract. For iterative solution of

14 DAVID FONG AND MICHAEL SAUNDERS

Precision Computations, vol. 19 of Software, Environments, and Tools, SIAM, Philadelphia,2006.

[16] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations,SIAM J. Numer. Anal., 12 (1975), pp. 617–629.

[17] , LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans.Math. Softw., 8 (1982), pp. 43–71.

[18] , Algorithm 583; LSQR: Sparse linear equations and least-squares problems, ACM Trans.Math. Softw., 8 (1982), pp. 195–209.

[19] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization,SIAM J. Numer. Anal., 20 (1983), pp. 626–637.

[20] G. W. Stewart, Research, development and LINPACK, in Mathematical Software III, J. R.Rice, ed., Academic Press, New York, 1977, pp. 1–14.

[21] E. Stiefel, Relaxationsmethoden bester strategie zur losung linearer gleichungssysteme, Comm.Math. Helv., 29 (1955), pp. 157–179.

[22] Yong Sun, The Filter Algorithm for Solving Large-Scale Eigenproblems from Accelerator Sim-ulations, PhD thesis, Stanford University, Stanford, CA, March 2003.

[23] David Titley-Peloquin, Backward Perturbation Analysis of Least Squares Problems, PhDthesis, School of Computer Science, McGill University, 2010.

[24] Henk A. van der Vorst, Iterative Krylov Methods for Large Linear Systems, CambridgeUniversity Press, Cambridge, first ed., 2003.

[25] David S. Watkins, Fundamentals of Matrix Computations, Pure and Applied Mathematics,Wiley, Hoboken, NJ, third ed., 2010.