Scaling for Numerical Stability in Gaussian Elimination

Scaling for Numerical Stability in Gaussian Elimination

ROBERT D. SKEEL

Umverslty of Ilhnois at Urbana-Champmgn. Urbana, Ill:nots

ABSTRACT Roundoff error m the solution of hnear algebraic systems is stud,ed using a more reahstsc notion of what st means to perturb a problem, namely, that each datum :s subject to a relatwely small change Th:s ,s particularly appropriate for sparse linear systems The condition number :s determined for th:s approach The effect of scahng on the stabdlty of Gaussmn ellmmat,on is stud:ed, and st is d:scovered that the proper way to scale a system depends on the right-hand s:de However, ff only the norm of the error is of concern, then there ~s a good way to scale that does not depend on the right-hand stde

KEY WORDS AND PHRASES numerical stablhty, Gaussmn ehmmatmn, ill condntonmg, scahng, eqmhbratlon, pivoting, backward error analysts, roundoff analys,s, sparse Gaussmn elimmauon

CR CATEGORIES 5 ! 1, 5 14

1. Introduction

One of the open problems m numer ica l analysis is that o f scaling a general matrix. In their book, Forsythe and Moler [6] state that " the need for proper scaling of a matr ix is very compel l ing i f we are to devise a p rogram to solve as m a n y l inear equa t ion systems as possible," and yet "it is qui te unclear to us how to p rogram a reasonable scaling o f a general matr ix ." Most a lgor i thms for scaling (see Curt is and Re id [5] for a compar ison) are based on the idea, as expressed by Stewart [21, p. 157], that "since the dispari ty in the sizes o f the e lements o f A ts respons,ble for the problem, it is natura l to a t tempt to scale the rows and co lumns o f A so that the matr ix is ba lanced ." However , in the next pa ragraph Stewart admits that "pa t scaling strategies are suspect. In spite o f intensive theoret ical investigation, there is no satisfactory a lgor i thm for scaling a general matr ix ." It is the purpose o f this paper to do a careful error analysis for Gauss ian e l iminat ion with pivot ing and thus de te rmine a theoret ical solution o f the problem o f scahng a system o f l inear equations.

Our approach differs f rom previous efforts in two impor tan t ways. The first difference is that we scale the l inear system rather than the matr ix o f coefficients. M u c h earl ier thinking, inf luenced by the papers o f Forsythe and Straus [7] and Bauer [2], was that scahng should reduce the condi t ion n u m b e r o f the matrix. The second &fference concerns the assessment o f the roundof f errors in terms of "equ iva len t " da tum errors. Instead o f measur ing the da tum errors relative to the largest datum, we measure the individual relat ive errors in the data. Thus we use a more restrictive idea o f what it means to per turb the p rob lem slightly.

A system o f n equat ions

~ avxj = b~, l _~ l ~ n, J~l

Permission to copy without fee all or part of this material ,s granted provided that the copies are not made or distributed for dtrect commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by perm,ss,on of the Association for Computing Machinery To copy otherwise, or to repubhsh, reqmres a fee and/or specific permission This research was sponsored by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under Grant AFOSR-75-2854 The Umted States Government ts authorized to reproduce and distribute reprints for governmental purposes notw:thstandmg any copyright notation hereon Author's address- Department of Computer Science, 222 DJgttal Computer Laboratory, Umvers,ty of Ilhnms at Urbana-Champaign, Urbana, IL 61801 © 1979 ACM 0004-5411/79/0700-0494 $00 75

Journal of the Assoctatto• for Computmg Machinery, Vol 26, No 3, July 1979, pp 494-526

Scal ing f o r Numer i ca l Stabil i ty in Gaussian E l i m i n a t w n 495

in n unknowns x,, 1 _< i _< n, Is often written in matrix notation as

A x = b.

The solution Y computed in floating-point arithmetic is not generally exactly equal to x, but it still may be acceptable if it satisfies one of two criteria:

(i) It fulfills the accuracy requirements of the problem poser. This usually involves some measure of how far .~ is from x. For example, in determining the coefficients of an interpolating polynomial, it is some norm of the residual r = A ( x - .~) which is of concern. More often though it is some norm of the error ~ - x which is of concern.

(ii) It is the exact solution of a problem which differs from the given problem by less than the uncertainty in the data, so that ~t is theoretically possible that ~ exactly solves the original problem. Uncertainty in the data is generally present if only because of the roundoff error introduced when the numbers are put into the computer.

Assessment of an algorithm by the first and second criteria is the goal of the forward and backward error analysis, respectively.

For either kind of error analysis, it is helpful to have some reasonable standard for comparison. As our standard we consider the effects of introducing errors of limited size E into the data. However, instead of requiring that the norm of the datum errors be bounded by ¢ times the norm of the data, we require that each individual datum error be bounded in absolute value by E times the absolute value of the datum. There are a number of reasons for believing that this second approach is more realistic. First, most numerical computations are done in floating-point rather than fixed-point arithmetic, and for floating- point computation the conversion of data to internal representation results in errors of the same relative size. Second, measurement errors are usually more nearly the same in relative size than in absolute size. Third, this approach does not permit the introduction of errors into zero elements, which is appropriate when sparse matrix techniques are employed because nonstored zeros are not part of the input data of an algorithm (cf. Miller [16]). Furthermore, the zeros of a sparse matrix are usually exact, and it would be inappropriate to introduce an error, which, for example, might correspond to the introduction of a connection between two junctions of an electrical network.

In forward error analysis we obtain a bound on the norm of the error or whatever we are using to measure the goodness of the approximate solution. For comparison purposes we determine the least amount by which A and b must be perturbed to get a solution that is equally bad. This requires a knowledge of the condition number, which measures the sensitivity of the solution to changes in the data. In Section 2 the condition number of a linear system is determined to be

II IA-'I IAI Ixl + IA-'I Ibl II Ilxll

where II o II is the max norm and the absolute value of an array means that the components are replaced by their absolute values. Our treatment of con&tion gives mathematical form to the informal discussion of ill conditioning found in Hammmg's [9] book.

In backward error analysis we obtain a bound on the "backward error," which is the least amount by which A and b must be perturbed to get a solution that is equal to .~. (Other definitions for the backward error are possible, but we show that they are roughly equivalent.) The starting point of such an analysis is an a posteriori expression for the backward error,

Ir, I max (Ibl + Ial I~1),'

which was obtained by Oettli and Prager [17]. I f the backward error is less than the unit roundoff error u, then the second acceptability criterion is satisfied. And if it can be shown

496 R.D. SKEEL

that an algorithm produces a backward error that is always bounded by some fixed multiple K(n)u of the umt roundoff error, then by increasing the precision of the intermediate results by a factor K(n), the second acceptability criterion can be met An algorithm with this desirable property is said to be stable. For some algorithms it is difficult to determine whether or not they are stable, but it can be shown that they satisfy a weaker condmon called asymptouc stability. This essentially means that the algorithm is stable for infinitesimal values of u.

The stability and accuracy of Gaussian ehminatton with column pivoting (the usual variant of partial pivoting in which the largest element of the next column Is used for a pivot) is examined in Section 4. Through a counterexample it is shown that Gaussian ehmination is not asymptotically stable for any pivoting strategy that depends only on the matrix of coefficients. Thus the common practice of separating the matrix factorization from the solution of the linear system leads to numerical instability. Then by means of a careful error analysis performed m Appendix A, there Is obtained a bound on the backward error that contains the quantity

max, ([Di -~ A [ ].~[), min~ ([D~ -1A[ [~[)j

where D71 is the matrix of row scaling factors. TMs quantity is minimized by choosing

Dx = drag([A[ [~[),

which calls for the ith row to be divided by ]a,l~l] + [a,2-~2[ + .-- + ]a,n.~,]. It is shown that with such a choice for D1, column pivoting would be stable. Of course this is impractical, which explains why there is no satisfactory algorithm for scaling a general matrix. Nonetheless, the ratto

max, ([A[ [.~[), min~ ([A] [.~])j

is an excellent a postenori measure of how poorly scaled the system is. Finally, a forward error analysis shows that the problem of scaling for accuracy has many solutions, one of which depends only on the coefficient matrix.

Sometimes, programming considerations (Sherman [ 19]) call for the use of row pivotmg instead of column pivoting, where by row pivoting we mean that columns are interchanged so that each pivot is the largest in its row. In SecUon 5 it ts shown that row pivoting could be made stable if it were somehow possible to scale the columns with the matrix of scale factors

D2 = diag([ ~ D.

This calls for each column to be multiphed by its corresponding computed solution value. A measure of ill scaling is given by

(IA le),ll-~ll max ( ~ - [ ~

where e is the vector of all ones. Again, the problem of scaling for accuracy has a solution not depending on the right-hand side.

Column pivoting may be regarded as the generalization of complete pivoting, in which the ordenng of the columns is arbitrary, and similarly row pivoting as the generalization in which the ordering of the rows is arbitrary. From this observation it follows that the results of both Sections 4 and 5 apply to complete pivoting.

Before proceeding it might be interesting to demonstrate the instability of complete pivoting with a simple 2 x 2 system of equations. Constder Ax = b where

A = [~ 30] and b = [01].

Scaling for Numerical Stabihty in Gaussian Elimmation 497

The coefficient matrix A is equihbrated according to the definition of Forsythe and Moler [6, p. 45]. Using rounded t-digit decimal (floating-point) arithmetic, the elimination step yields A'x = b' where

and so the computed solution

[0.33 ... 3 x 10- ' ] = [_0.33 ... 3 J

The backward error is determined by considering perturbed problems of the form

and choosing the relative changes 8,j so as to minimize the maximum 18,j I. In this case, 821 must be chosen to be -1 , and so the backward error is 100 percent regardless of the precision t.

2. Conditton of Lmear Systems

The condition of a problem is the sensitivay of its solution to uncertainties in the problem data. The importance of this concept is that it indicates the amount of accuracy that one should reasonably expect for the solution of a problem with inexact data. And even for problems with exact data, the conversion of the numbers to the computer's floating-point number base usually introduces errors.

As the measure of the condition of a problem, we take the maximum amount by which an infinitesimal perturbation in the problem data can be amplified in the solution. More precisely, if ~ denotes the given problem data and ,~(~) denotes the solution of a problem with data ~, then we define the condition number to be

~m relative distance from q,(~) to q~(~) (2.1) ~ relative distance from ~ to

In the case where ~ and ff(~) are scalars, the condition number is the absolute value of the relative derivative, namely,

~ ' ( 0

(cf. Bauer [4]). For linear algebraic systems Ax = b, we have ~ = (A, b) and ~(~) = A-lb. (In roundoff analysis the number of equations n is not considered to be part of the problem data; rather we take the point of view that each value of n defines a separate class of problems.) There are two crucial matters that have to be settled: (i) how to define relative distance in the problem space, and (ii) how to define relative distance in the solution space.

Any problem that is to be solved in an approximate sense is incomplete unless there is also some "metnc" specified for measuring how good the approximation Is. It is this metric that should be used in defining the "relative distance from ff(~) to ff(~)." Often this metric measures how close the approximate solution is to the true solution; in other cases it measures how well the approximate solution satisfies the problem. The actual choice of the metric depends on the purpose of the computation, but for most applications one can use the ratio

U~ - xll llxll

where II ° II is some suitable norm. (Note that if II ° II is a norm and A is nonsingular, then IIA ° II is a norm.) In this paper we select the max norm for II o II because of its convenience.

498 R.D. SKEEL

It is adequate for many purposes if appropriate units are chosen for the unknowns. The question of how to measure the "relative distance from ~ to ~" is more &fficult to

answer because a completely specified approximation problem need not include a metric for the problem space. However, there is one metric that is always safe to use, namely, the componentwise relative error

max

I f the value o f this quantity is small, then ~ is close to ~ by any reasonable standard, especially in wew of the fact that putting data into the computer results in small componentwise errors. This metric has another advantage in that it ~s always meaningful regardless o f the physical dimensions of the problem data, and thus it is independent of possibly arbitrary choice of units for the data. For these reasons we take as our measure of relative distance the smallest E ~ 0 such that

[,~,j - a,j[ _< ela,~[ and [b, - b,I --< EIb, I.

This seems to be consistent with the ideas expressed by Hamming [9]. On page 117 it is stated that

the term "dl-condmoned" is ill defined The vague idea is that small changes m the mmal system can produce large changes m the final result If we are to take floating point seriously, then we should say "relattvely small changes" and "relatively large changes"

And on page 122 it is stated that "the system is indeed ill conditioned because, no matter how we try, we are unable to solve the system so that the answer is not sensitive to small changes in the original coefficients." Thus it seems that by "relatively small changes in the initial system" Hamming means relatwely small changes m the coefficients of the initial system. A similar thought is expressed by Kahan [11, p. 795]. In fact, measuring the relative error in individual numbers seems to be the usual approach for the general theory of roundoff analysis [4; 12; 21, Ch. 2], although this is often not true in actual applications of roundoff analysis.

Having chosen our metrics, we are in a position to determine the condition number of a system A x = b. We begin by obtaining bounds on the uncertainty in the solution due to the uncertainty in A and b. Bounds of this type also appear in Bauer [3]. Our notation uses inequalities between arrays to mean inequality of the corresponding components.

THEOREM 2.1. Let A x = b and (A + t~A)(x + 8x) = b + ~b where IaAI -< elAI and Iabl --< elb]. Then

Ilaxll < E II IA-~I Ihl Ixl + Ia-~l Ibl II Ilxll - ( l - Ell I A - ~ I IAI II)llxll

provided that the denominator is positive. PROOF. We have that

and so

8x = - A - I S A ( x + ~x) + A - ~ b ,

Iaxl ~ IZ-'116Al(Ixl ÷ Iaxl) ÷ Ia - l l I~bl

~- EIA-~I IAl(Ixl ÷ 16xl) ÷ EIA-111bl.

Therefore

(2.2)

II~xll ~- Ell IA-al IAI Ixl + IA-~I Ibl II ÷ Ell IA-111AI II Ilaxll. Q.E.D.

Remark. Bauer [3] shows that the bound of Theorem 2.1 can be improved by replacing (1 - eli IA-~I IAI I1) -1 by I1(I - EIA-~I Ih I)-*11, and so it Is not necessary that Ell Ih-~l IZl II < 1 but only that the spectral radius of elA-II IAI be less than 1.

THEOREM 2.2 Let A x = b. Then there exist 6A and 8b such that I SA I = EIA I, I dbl = El b l,

Scaling for Numerical Stabthty in Gaussian Ehminatwn

and the solution x + 8x of (A + ~A)(x + ~x) = b + 6b sattsfies

II~xll IIIh-l l lAIIxl+lA-111bll l Ilxll (l + ¢111A-~I IAI It)llxll

PROOF. Let l be such that

(IA-~I IAI Ixl + I-4-'11bl), --II IA-~I Ial Ixl + IA-'I Ibl II. Define dA and 8b by

and

where A -1 = (av). Then

8ajk = sgn(,x0x,)EI aM

8b~ = -sgn(a0)e l bA

but f rom (2.2)

( A-18AX -- A-ldb) 1 = ~ ~ °~oSa.t~ck -- E °toSbj I k l

= ¢([A-~I ]A[ Ix] + I A-~] Ib]),

-- ell } a - ' l IAI Ixl + I A - ' I Ibl II,

499

Therefore

i ~ II~xll/llxll

where e(~A, 8b) = m i n ( e _> 0: IdAI _< 4hi , lSbl ~ ~lbl) and 8x satisfies (A + 6A)(x + 8x) = b + 8b. Consider any sequence (dAm, 8bin) for which E(SAm, 8bin) ~ 0 as m ~ oo. By Theorem 2.1 we have

llSxmll < ,(~hm, ~bm ) II IA-~l lhl lxl + lA-ll lbl ll Ilxll - ( l - - e(dhm, 8bm)ll I A - ' I I a l II)llxll

for sufficiently large m. Therefore

II~xmlllllxll < II 1,4-111AI Ixl + IA-'I Ibl II l l ~ , ( ~ A m, ~bm) -- ]] x II '

which gives an upper bound on the condiUon number . Let ~m be a sequence converging to zero. By Theorem 2.2 there exists a sequence (dAm, 8bin) such that E(SAm, 8bin) = ~m and

II~xmll II IA-'I IAI Ixl + IA-111bl U [Ixll >- ~ (l + Emil IA-'I Ihl II)[Ixll

118xmll/llxll>_ II I A-~] IAI Ixl + IA-'I Ibl II ~ o o Em Ilxll '

(A-18Ax - A-ldb)t = - (Sx + A - I ~ A ~ x ) i ,

and so

,111A-'I IAI Ixl + IA-111bl II-< 118xll + "111A-*I IAI II II,~xll. Q.E.D. THEOREM 2.3 The condition number, as defined by (2.1), of a linear algebraic system

Ax = b is

II IA-'I Ial Ixl + IA-'I Ibl II IlxU

PROOF. The condi t ion n u m b e r o f a l inear a lgebraic system Ax = b is

5 0 0 R D. SKEEL

which gives a lower bound on the condition number. Q.E.D. In subsequent sections of this paper, we will consider the effects of perturbing only the

elements of the coefficient matrix. THEOREM 2.4. Let A x = b and (A + 8A)(x + 8x) = b where ISAI <_ ~IAI. Then

118xll< ellla-al IZl Ixl II Ilxll - ( l - ell IA-al Ihl II)llxll "

PROOF. Similar to that of Theorem 2.1. Q.E D. THEOREM 2.5. Let A x = b. Then there extsts 8A such that 18AI = elAI and such that the

solution x + 8x o f (A + 8A)(x + 8x) = b satisfies

II~xll> e l l lh- ' l IAI Ixlll Ilxll - ( l + ~111A-al I h l U)llxlt"

PROOf. Similar to that of Theorem 2.2. Q E.D. It follows from these last two theorems that when only A is subject to uncertainty, the

condition number is

II IA-al Ihl Ixl II Cond(A, x) =

Ilxll Since [I [A-a[ [A[ [x111-< I1 [A-~{ [A I Ix[ + [A-'[ Ib[ 11 -< 2ll [A-a[ [A I Ix[ I1, Cond(A, x) is also adequate for the case where both A and b are subject to uncertainty. A somewhat similar quantity

i f (A, x ) = IIA-all EJ IIAe~,>ll IxA Ilxll

is used by Van der Sims [22], which he calls the "con&tion number of the solution [23]." Here e(j) denotes t h e j t h unit vector

The condition number of a matrix A could be defined as the maximum value of Cond(A, x), which is achieved with x = e = (1, 1 . . . . 1) T Thus

fond(A) = Cond(A, e) = l[ IA-al IAI II-

This quanuty is more satisfying as a measure of 111 condition than the usual x(A) = IIA-1 II IIA II (el. Bauer [1]) for a couple of reasons. First, the matrix I A-al l h l is a mapping of the soluUon space into itself, which means that the quantity II IA-'I IAlU can be defined enurely in terms of the solution space norm; whereas the definluon x(A) = IIh-'ll IIAII requires both a solution space norm and a residual space norm. Second, the quantity Cond(A) ts invanant under row scaling. Multiplying a system of equations by a dmgonal matrix does not change the problem m any fundamental way. For example, a diagonal system Dx = b seems to be well conditioned. Accordingly, we have that Cond(D) = 1, whereas K(D) can be arbitrarily large.

Example. According to Hamming [9, p. 120], the system d x = b ts well condmoned where

A 2e 2 , b = ~ 6E

2e - 2~

The inverse of the coefficient matrix and the solution are gwen below:

= ~ 0.4 -O. le -1 - 0.3 0.2e -a - 0.6 , x = . 1 - 1.8¢ 0.2 0.2~ -a - 0.6 -0.4e -a 0.6_]

Hence

[- 1 + 1.8E 2.4¢ 1.6~ "] 1 | 0 . 4 ¢ -1 + 1.2 1.4 - 0.6e 0.8 ] ,

[A-I[ [A[ - 1 - 1.8¢ L 0.8E -1 1.6 1 - 0.6e

Scahng for Numerical Stability in Gaussian Elimination

a n d

1 I a - ' l l A I I x l + I A - ~ l l b l =

1 1 .8e [ 9.6~ + 3.6E 2"] 4.8 + 2.4E | ,

6 - 2.4E J

501

which shows that the system is well conditioned. However,

0.8c -1 + 2.6 - 0.6~ Cond(A) =

1 - 1 .8~

which indicates that the system would be ill conditioned for some different right-hand side b, and in fact, Hamming [9, p. 122] gives such an example.

3. Stability of Algorithms for Linear Systems

Let -F, -~, ~, ] denote the floating-point operations corresponding to +, - , x , / . Every reference to a floating-point result x O y carries with it the assumption that x, O, and y are such that the result is well defined. Nothing is assumed about the floating-pomt arithmetic except that the relative roundoff error is bounded by u/(l + u) where the unit roundoff error u is a small positive number, that is,

x 6 y = (x Oy)( l + 8)

for some 8 depending on x, 6 , and y, which satisfies

U

181-< l + u •

It follows from the above condition that

x O y = x O y 1 +8'

where 18'1 <- u. Note that for rounding u = ½/3 ~-t and for chopping u =/3 l-t, where/3 is the base and t is the number of base/3 digits in the fraction of the floating-point numbers. (However, u is infinite for the floating-point arithmetic of computers that have no guard digit or that truncate before normalizing.)

For any computed solution .~ we define the relative backward error to be the smallest real number 7/3 such that

(A + 8A)(.~ - 8x) = b + 8b

for some 8A, 8b, and 8x with [SAI -< ~JIA I, 18b[ -< ~/31b[, and lSx[ -< ~lJIx - 8xl. It is not immediately obvious that there is such a smallest 7/3; however, it is straightforward to show that a minimum does exist provided we define min ~ = +oo. The backward error can be interpreted in the following way: The computed solution Y is the rounded solution of a problem with rounded data where ~/3 is the maximum relative roundoff error. Thus if ~/3 is no larger than the unit roundoff error u, then our solution is as good as our data deserve; otherwise, improved accuracy may be justified, perhaps by using iterative improvement. Further motivation for this definition is given in Miller [ 14] and Bauer [4]. I f the algorithm fails to compute a solution because the matrix is nearly singular, then the backward error is defined to be the smallest real number 7/3 such that A + 8A is singular for some I 8A I with 18A [ _< ~/3[A I. Note that the definition of the backward error is independent of our selection of the max norm to measure the accuracy of the computed solution.

Stability of the algorithm means that there exists a stabihty constant K(n) and a stabihty threshold if(n) > 0, both independent of the problem data (A, b) such that the relative backward error

713 _< K(n)u

provided that u _< if(n). A weaker concept asymptotic stabthty allows the threshold if(n) to

502 R.D. SKEEL

be data dependent. These two types of stability are the same as the "backward stability" and "asymptotic backward stability" used by Miller [13].

Note. The error bounds implied by asymptotic stability must be satisfied by any kind of arithmetic having sufficiently small relative roundoff errors. For realistic floating-point arithmetic any division-free algorithm wtth fixed data (A, b) is exact for sufficiently great precision. Thus to prove that a division-free algorithm is not asymptotically stable, one must resort to contrived types of arithmetic.

The backward error ~3 ~s not easy to determine, and for this reason we introduce two variants of the backward error which are easier to compute. Let ~/2 be the smallest real number such that

(A + 6A).~ = b + 6b

for some 6A and 6b with I6AI _< ~2IAI and [6b I _~ ~lz[b I. Let ~h be the smallest real number such that

(A + 8A)~ = b

for some 8A with 16A ] ~ ~i]A [. Natural ly ~a -- ~2 ~ ,~. Following is a result due to Oetth and Prager [.17], shghtly too&fled to avoid diwsion by

z e r o .

THEOREM 3.1. The backward error ~12 ts the smallest real number ~1 such that

Irl-< n(IAI I~1 + Ibl);

so, in particular, i l i a [ I~l + Ibl > O, then

Irl ~2 = m a x IAI I~l + Ibl

where division o f two vectors is defined componentwtse. PROOF. Let ~ be the smallest real number that satisfies the inequality of the theorem.

It is sufficient to show that

and that

First, suppose that ~2 < +oo. Then there exist SA and 6b such that (A + 6A)~ = b + 6b where I~AI -< 7121A ] and [$b] -< r/2lb[. We have

Irl -- 16A.~ - 6bl -< I~AI I~1 + 18bl -< ~2(IAI I~1 + [b[), (3 l)

and so ~/_< ~/2. Second, suppose that ~1 < +oo. Then there exists a diagonal matrix H such that b - A.~ = H(IAJ I£l + Ib[) and IH[ -< 711. Define

6A = H I A [dlag(sgn .~)

and

,~b = - H l b l .

We have that $A.~ - $b = b - A.~ or (A + 6A).~ = b + 6b and that I,~hl -< nlA[ and [6bl -< nlbl; and so ~2 -< ~/. Q.E.D.

THEOREM 3.2 The backward error ~11 is the smallest real number ~1 such that

Irl-< nlAI I~1;

so, m partwular, / f l h l I~1 > 0, then

Irl TI1 = max IAI I-~1"

Scahng fo r Numerical Stabdity in Gausslan Elimmation 503

PROOF Similar to Theorem 3.1 Q E.D. Remark. Similar types of results app ly to other problems; for example , the backward

error for an a lgebraic equauon aox n + alx n-~ + ... + a , = 0 ~s gwen by

lad" + a12 n-1 ÷ . . . + an[ 'r/2 = [a02n[ + [al2n_l[ + . . . ÷ la-I

The fol lowing theorem gives bounds on the relaUve backward error ~/a m terms of the more easily computed 7h and ~2. These bounds show that the ~'s are roughly the same size when the backward error is small, and so m the remainder of the paper only the quanu ty ~/~ is used, which we denote s imply by 7.

THEOREM 3.3 The three types o f backward error sattsfy

rl-----L-- ~- rla -< 7/2, (3 2) 2 + r / 2

q___L__ < ~a <- m, (3 3) 5

3 ÷ ~'r/1

~/------2---~ < ~2 -< 7/2 (3.4) 2 + ~ h

PROOF.

of (3.2) is obvious tf 7a >-- 1. Hence assume 7z < 1. There exist 8A, 8b, 6x such that

(A + 6 A ) ( 2 - d x ) = b + 6 b

where I~AI -< ~ l Z I, I~bl -< w~lbl, I&l -< ~12 - 8xl. Hence

Irl = 16A(2 - 6 x ) - A d x - 6 b [

-< 2W, IA 112 - ~xl + n~lbl.

The second inequa lmes of(3.2), (3.3), and (3.4) are obvious The first inequal i ty

It is easily shown that

(3.5)

2r/a Irl-< (IAII21 + Ibl).

1 - "Oa

Therefore, ~/2 -< 2~a/( l - ~/a) which verifies (3.2). The first mequah ty o f (3.3) is obvious if a F r o m (3.5) a Hence assume 7/3 < ~.

and using (3.6) gives

Therefore,

Irl ~ 2~a[A112 - 8xl + walrl + ~zIA1121,

r/1 < r/3(3 -- r/a) - - (1 -- 713) 2

_ 5 _ 1 ) '

and so

1 12 - i~x I _< - - 1 2 1 , (3.6)

1 - ~3

504 R .D . SKEEL

which proves (3.3). The first mequahty of (3.4) is obvious if ~/2 -> 1. Hence assume ~/2 < 1. We have

[rl-< m(Ibl + IAI I~1)-< n2(Irl + 21A[ I~l),

and so

Irl ~-12~nzn2 IAI I£1.

Therefore, ~h -< 2T/2/(1 - ~/2), which imphes (3 4). Q.E.D. A good algonthm should (i) return an acceptable answer most of the time (robustness)

and (ii) signal failure whenever it does not return an acceptable answer (rehabdity). We could formally define an algorithm to be reliable if there exist K(n) and if(n) such that for any (A, b) and any u _< if(n), either the algorithm computes an answer with ~/_< K(n)u or the algorithm signals failure.

Any algorithm for solving linear systems can be made rehable by computing the backward error with floating-point arithmetic and then accepting the answer only ff the computed backward error is less than a prescribed multiple of the umt roundoff error. For example, the next theorem shows that if the computed backward error/ /_< Ku, then we can conclude that ~! -< (K + n)ue °~÷2)u.

The residual is to be computed in single precision

~, = b, "= (... (aa X £1 dr aa X £z) "'" + a,n X £n)

or in double precision

i, = fl(b, -" (... (aa x £1 dr a,2 x £2) ... dr a,n X £n)).

Here O denotes the double-precision counterpart of O where it is assumed that

x O y = ( x © y ) ( l + 8)

with 181 _< u2/(l + u2). In pracuce the double-precision unit roundoff error is either this small (rounding in base two) or smaller. By fl(o) we mean the conversion of a double- precision value to a single-precision value. It is assumed that fl(x O y) = (x © y) ( l + 8) with - u / ( l + u) _< 8 _< u, which is true for rounding and chopping. The computed backward error/I is determined by

//ffi max 1~,12(... ([a,1 x £11 -[- la,2 x £21) " ' " dr la,n X £nl). 1

THEOREM 3.4. I f ~ is the computed value o f ~, then

e -tn+2~u ~1 - mT~ a ~- ~ -< e ~n+2~u ~ + nfie~a where

PROOF. products

We have that

u for smgle-precision residual accumulation, = u 2 for double-precision resMual accumulation.

Let ~ be the computed value of A£. By the standard error analysis for inner

Iq - A £ 1 - < [(1 + fifi)n _ l] lAI I£1

-< na~alAI I£1. (3.7)

-> max (, Ib-~l IAIl£1

Scaling f o r Numerical Stability in Gaussian Elimination

and

This reduces to

,, +u,(, Ib - ~1 IAII~I"

I b - q l < (1 + 2 u ) '~ ^ 1 ~ < max u),_ 2 ~, (l + 2u)(l + u)" IAI I~1 - (l +

from which it follows that

Using

and (3.7) gives

I b - q l < e¢,~+2),, ,~. e -°~+2)" '~ _< max IAI I-~1 -

Ib-,71 - I q - A 2 l - < Irl-< I b - ~] + I '7-A~I

505

for

a(1) q ..~ a q ,

f m a (k) ?a(k) k = l ( l ) n - 1, ~ , k = , k / kk,

/ a ( k + D - - a (k) 2,. m ~ ~(k) k q - - ~.,' '/~ u k , ' '

{~ i f / < j , 1,~ = i f i=j,

m,s if i > j ,

(~") if i_<j, u v = 'J i f t > j ,

y~ = b~,

i _> k + 1, (4.1)

z,j _> k + 1, (4.2)

(4.3)

Ib - ,~l - n~g~lAI I~l-< Irl--< Ib - ql + n ~ a l A I I~l. Dividing by [A[ I.f[ and taking the maximum yields

[ b - q l nfi~ a_<7/_<max I b - ~[ max IAII-~----/ ~ + nfienU" Q.E.D.

Before concluding this section, it should be mentioned that for some classes of problems ~t may be unreasonable to expect an algorithm to be stable. I f the number of output values is fairly large compared to the number of input values, then it becomes very difficult for an algorithm to be stable because in the definition of stability each output value must arise from the same perturbation of the input values. For example, Miller [14] shows that the usual algorithm for inverting triangular matrices is unstable. Hence it seems better to use stability as a relative rather than as an absolute concept. This idea is used by Miller [15].

4. Gaussian Elimination with Column Pivoting

This section applies the ideas of the preceding sections to Gaussian elimination with partial pivoting using row interchanges and implicit row scaling. (Implicit column scaling would have no effect on the algorithm.) The reciprocals of the scale factors are to be given as inputs dl, dz . . . . . dn to the algorithm, and so the pivoting is done as if one were solving D - l A x = D- lb where D -- diag(d~, d2, ..., dn). To keep the notation simple, it is assumed that the equations are numbered according to their ordering after all row interchanges have been performed. The computations of the algorithm are as follows:

506 R. D. SKEEL

for

i = 2 ( 1 ) n ,

for

i = n - - l ( - l ) l ,

y, ffi b, -~ (... (1~1 x yl + l~2 X y2) ... 4- It.i-1 x y~--l), Xn = yn iunn ,

.~, = (y, : ( . . . (u,,,+~ x ~,+1 + u,,,+2 x £,+2) "" -/- u ~ x . ~ ) ) [u , , . (4.4)

It is assumed that the selection of a pivot is done exactly so that

[a~)/d~[ _< [a~)/dk[, i>_ k + 1. (4.5)

In cases where there are more than one suitable pivot, the one wRh the lowest row index is chosen. The assumptmn of exact pivot choice avoids some minor technical difficulties, and it also makes for a sharp error bound in the case where there is no scaling.

It is important to appreciate the nature of the functional relationship between g and D. The computed soluUon ~ is a function ~(P) of the row permutation P, which in turn is a function II(D) of the scaling matrix D. (Note that H(D) is also defined for values of D that are not floating-point numbers because the algorithm does not perform floating-point arithmetic on D.) I f ~ is viewed as a function defined in (dl, d2 .... , dn)-space, it would be constant over regions bounded by hyperplanes passing through the origin. For example, suppose the values a~ ) are determined by a certain choice al, d2 . . . . . dn of scale factors. Then ~ is constant for all values of (dl, d2 . . . . . dn) that satisfy

Idd > ]a~Tk~[ Idkl, i > k.

Also, any permutation P of the equations which does not result in a pivot exactly equal to zero can be realized by partial pivoting for some scaling D of the equations. This fact underscores the importance of proper scahng.

For the remainder of this section, l e t / ) be the givenf ixed choice for the scale matrix that determines the ordering of the rows, and let D be any arbitrary matrix whose diagonal entries d~, d2 . . . . . dn satisfy (4.5).

We begin by obtaining a totally a priori error bound for Gaussian eliminaaon, from which the other results of this section follow. The proof is modeled after that of Forsythe and Moler [61, which is mostly borrowed from Wilkinson [24]. However, our error bound, like that o f Van der Sluis [23], is more informative than that o f Forsythe and Moler m that it distinguishes among the columns of A.

THEOREM 4.1. Let the vector ~ be computed by Gausstan elimination with column pivoting and row scaling where 1) -- diag( dl, dz . . . . . ,~n) is the matrix o f reciprocal scale factors. Then

IID-~rll-< x(n)ull ID-~hl I~1 II for arbitrary D = diag(dl, d2 . . . . . d~) satisfying (4.5) where

x(n) = [19.2 ~-2 - n - 8"]e 2~.

PROOF. A rather lengthy proof is given in Appendix A. Here we briefly indicate how one might obtain a weaker result more quickly. It is a standard result for partial pivoting without scahng (for example, [6, Ch. 21]) that there exists AA such that (A + AA).~ = b with [[AA [1 _< k(n)ul[A 1[ for u _< if(n). By straightforward adjustments to the proof, the result generalizes to include row and column scaling so that

l[ DT~AAD2I! --< I~(n)ull Di-~ADel[ •

Since D2 has no effect on the algorithm, it may be chosen arbitrarily. Set D2 = diag(l~l). Q.E.D.

Remark 1. The factor e e~" appears in Forsythe and Moler [6] as the constant 1.01. The advantage of e 2~" is that it indicates the nature of the higher-order effects and does not require placing some arbitrary restriction on the size of nu.

Scaling for Numerical Stability in Gaussian Eliminatzon 507

Remark 2. It is actually possible to show that ID-~rl _< LsID-~AI I~1 for some lower triangular matrix L8 satisfying IlZnll --< x(n)u, although the best possible Ls is somewhat complicated.

Remark 3. For pwoting without scaling Van der Sluis [23, Theorem 1] gives the bound II AZ etj,II -< k*(n)u IIh et~lll where (A + AA)£ = b, which implies that II rll -< nk*(n)ulllA I I~ I II. Conversely, Theorem 4.1 with D = /) = I implies the result of Van der Sluis if AA is defined by

aA = Ell Ihl I~1 II-le~l A [diag(sgn .~)

where I is the index of the largest component of [A I I~1. The bound of this theorem is almost always extremely pessimistic. In practice x(n) is

usually at most of order n according to Peters and Wilkinson [18]. However, there are cases where this bound can be attained in the limit as u ~ 0.

THEOREM 4.2. There exists a problem Ax = b and a floatmg-point arithmetic (St, -~, x, ]) such that the solution £ computed by Gaussian elimination wtth partial or complete pivoting satisfies

Ilrll = [19.2 "-2 - n - 8]u + O(u2). II Ihl I~l II

Therefore, the bound of Theorem 4.1 is the best possible bound up to first-order terms in u. PROOE. See Appendix A for the proof, which employs a modification of Wilkinson's

[24] example. Q.E.D. 4.1. SCALING FOR NUMERICAL STABILITY. The example at the end of Section I shows

that Gaussian elimination with pwoting is unstable. This instability can be explained as follows. Suppose that L = (l,~) and U = (u,j) are computed exactly so that A = LU. Stewart [21, Ch. 3] shows that the computed solution £ is deterrnmed from L and U m a stable way so that (L + 6L)(U + 8U)£ = b where ISLI -< K(n)u ILl and 16uI ~- g(n)ul UI for some constant K(n) depending only on n. For stability of the algorithm as a whole, it would be sufficient for AA = 6LU + (L + 6L)6U to be small relative to A m the componentwlse sense. However, it only follows that

I~A[-< (2 + g(n)u)g(n)ulZllU l,

and because of the possibility of cancellation m the formation of A from L times U, AA may have elements that are large relative to the corresponding elements of A. That as, a small perturbation of the L U factonzation introduced by backward error analysis may result in a large, and even infinite, perturbation of A. Thus it is the possibility of A being an arbitrarily badly conditioned funcUon of L and U which accounts for the instability of the algorithm.

For the example given in Section 1

A - [ ~ : ] , L = [11 ~ ] , and U = [ : _33].

This factorization is "stable" since L U = A exactly. It is the roundoff errors of back substitution that make the algorithm unstable because

has a nonzero element corresponding to a zero element of A. For this particular example an interchange of rows or columns makes A a well-condmoned function of L and U and hence stabilizes the Gaussian elimination. However, there exist matrices like

A = 0 1

508 R . D . SKEEL

for which no permutation of columns and rows results in factors L and U such that ILl I UI has the same sparsity structure as LU. In fact, this example can be used to establish an even stronger result:

THEOREM 4.3. Gaussian elimination is not asymptotically stable for any pivoting strategy that depends only on the coefficient matrix.

PROOf. Let a pivoting strategy be given. For the 3 × 3 coefficient matrix A, Gaussian ehmmation with the given pivoting strategy is equivalent to Gaussmn elimination without pwoting for some permuted matrix PA Qr where P and Q are permutation matrices There are four possibdities to consider for pAQT:

[i°il [i'il [i°!] [i' 1 , 1 , 1 , 0

1 0 1 1

Let E be an arbitrary positive real and define

Pb ffi [ (1, 1, _~)r in the first two cases, l ( l , - E , l) T In the last two cases.

il.

Let arbitrary u > 0 be given, and consider an arithmetic such that (-E) = l = --E -- l + U/2 and such that all other operations used in solving (PAQ T) (Qx) = (Pb) are exact. The solutmn ~ computed by Gaussian elimination satisfies

- , d 2 + u /4] Q.~ = --E/2 + u/4 |

1 + ~ / 2 - u/43

in the first and third case and

-E/2 + u /4] Q.~ = I + E/2 - u/4 [

-E/2 + u/41

in the second and fourth case. For all possibilmes the backward error 7/2 = mm {u/(4e - u), l}, which does not admit a bound of the form Ku where K is independent of ~. Q.E.D.

Remark. The floating-point arithmetic used in the proof of this theorem is unrealistic except for certain values of ¢ such as E = u/2. However, the validity of the proof depends on E being chosen independent of u; otherwise, it does not follow that the algorithm is not asymptotically stable but only that it is unstable. Undoubtedly it is possible to prove this theorem using more realistic counterexamples, although they would very likely to be more complicated

Let us examine more closely the conditions associated with a very large backward error. The factorization of A is done in n - l stages involving the formation of the values a~ ÷1) = air ~ = m,k x ,,kj-̂ tk) In our backward error analysis we must perturb a~ k) so that it satisfies a~k~-m a tk~ _tk+l) _tk~ (k-i-l) I f there is - ,k ~j + for perturbed values of m,~,-kj, and a v ~ltj cancel laaon in this sum, then a very large perturbation of a~ ~ may result. Thus the condition m,k--kj-~k~ i>>[ a~)l indicates the possibility that the computed solution will have a

- - ( k + l ) very large backward error. The extreme case occurs when a}~ ) = 0 ano a v # O, which is commonly called "fill in." For sparse systems of equations it IS quite common to order the rows so as to avoid fill in. This reduces computational cost, and it apparently may also contribute to stability in the sense of Section 3.

The instability of Gausslan elimination has been pointed out by Hamming [9], who on page 119 announces the "Theorem Pivoting can take a well-conditioned system into an ill- conditioned system of simultaneous linear equations" and on page 123 states, "we have not justified the pivoting method; rather we have shown that it is an 'old wwes' tale.' But like most old wives' tales, it is a mixture of truth and mystic faith." To prove this theorem,

S c a h n g f o r Numer ica l Stabil i ty in Gaussian El immat ion 509

Hamming uses the example discussed at the end of Section 2. For thin example, one ehminat~on step w~th partial or complete pivoting yields the system A ' x ffi b' where

I 3 2 4 2 1 1 1 3 + 3 ' ] b , A ' = 0 - 3 + 2, - 3 + 2 ' , = - 2 + 4 e (4.6)

2 1 0 - 3 + 2 " - 3 - " - 1 + ,

assuming exact arithmetic. This problem is 111 conditioned for small ,, since

0.8~ -~ - 3 + 3~ Cond(A', x) =

1 - 1 . 8 ,

I f the elimination were performed m floating-point arithmetic, then a shght perturbation of (4.6) could result, which may have a solution that differs from the true solution by an amount proportional to E -~. This kind of error could not arise from shghtly perturbing the original problem because it has a condition number of about 6. For example, suppose that the computed right-hand side of (4.6) was

3 + 3 , 1 b' = - 2 + 4 , - (u/4)/

- 1 + , + ( u / 2 ) _l

and everything else was exact. Then

= 1 - , - 1 u / 8 ,

1 + 6 -1 u / 4 J

and by Theorem 3.2 the backward error would be u / ( 8 , + u).

A related observation was made by Gear [8]:

It might be possible to say that 8A represents a perturbation to the original phystcal problem if the sparslty structure of 8A were the same as that of A Unfortunately, we wdl show that such a demand on the structure of 8A can lead to very large bounds on II~A II, bounds probably dependent on the condtuon number of A

This was supported by the example

I [i::l 1 1 - l - 1 0

c 0 0 b = x = , A = 0 ~ 0 '

0 0 1

for which Cond(A) = 4. The 2 × 2 example in the introduction shows that the situation can be even worse than Gear suggests, for Ildzt II/u - . +oo as u ~ 0 even though x(A) = 4. On the other hand, it is shown in Theorem 4.7 at the end of thin subsection that small bounds on liSA II can be achieved for an appropriate scahng of the equations regardless of the condition number of A.

The next theorem bounds the backward error by a quantity that clearly exhibits the effect of scaling. The assumption ts made that the rows of A corresponding to vanishing components of IAI I~1 were chosen as pivot rows as soon as possible.

THEOREM 4.4. Le t J ---- {i: (IAI I-/I), = 0}. If.,k~(k) = O f o r a l l i > k s u c h that i ~ J and k ~ J , then the backward error

PROOF.

max [D-'A] ].~1 • ~ ~_ ×(n) u rain,q/(JD-'A l l-~l),'

Fix D and let Do = dlag (d~ o, d~ ° ..... d~) be defined by

510 R . D . SKEEL

f i l l J , d' = { ~d, d' otherwise.

First we show that the diagonal entries of Do satisfy (4.5) for 0 < 0 _< 1. This is dear ly true for t, k ~ J and for i, k ~ J . Also, if i ~ J and k ~ J , then (4.5) follows from a~ ) = 0.

a(k) / a a(k} . dO Finally, i f i ~ J and k E J , then la~)/d,°l = la~)/d I -< I ~k/-~ I -< I ks/ k I. Therefore, we may apply Theorem 4.1 with D = Do to get

IIDff'rll <- x(n)ull IDfflhl I~1 II. (4.7)

Multiplying this by 0 and letting 0 ---~ 0 yields

max(ID-~rl),-< x(n)u max(ID-~A [ I~1), -- 0.

Setting 0 = 1 m (4.7) yields for i ~ J

(IO-'rl), , , II ID-~AI I~1 II --< xtnyu (iD_~Ai I~1), (ID-~AI I~1),.

Together these last two inequalities imply

max ID-~AI I~1 Irl -< x(n)u min,¢j (ID-1A I I~1), IAI I~1.

Applying Theorem 3.2 concludes the proof. Q.E.D. By choosing

d, = (IAI I~1), (4.8)

the bound on the backward error is minimized, giving ~ _< x(n)u. This suggests that a linear system should be scaled by dividing each row by its weighted 11 norm where the weights are the components of the computed soluUon. In the unlikely event of zero scale factors, one should use instead extremely small nonzero numbers, so small that the corresponding rows are certain to be selected for pivoting at the first opportumty. Unfortunately, (4.8) represents an implicit equation for the scale factors d, because the computed solution .~ is a function ~(II(D)) of the scahng matrix D; that is, D must solve the equation

D = diag(lA I I6(IH(D))I), (4.9)

for which a solution may not exist. The nature of this equation becomes more apparent by noting that it is eqmvalent to solving for a permutation P that satisfies

P = H(diag(IA [ 16(P) l)). (4.10)

For if D satisfies (4.9), then P = H(D) satisfies (4.10); and if P satisfies (4.10), then D = diag(IA I I~(P) l) satisfies (4.9). In principle we could determine the solution to (4.9), if there is one, by testing to see ff any of the n! permutations P saUsfy (4.10). In the cases where a solution exists, the backward error is bounded by x(n)u. One suspects that (4 10) almost always has a solution, and it is even conceivable that (4.10) always has a solution, at least whenever 6(P) is defined for all P. The existence of a solution for (4.10) lmphes the existence of an ordering for the rows, which makes Gaussian elimination stable.

If one wishes to solve (4.10), the following iteration would likely converge and converge quickly for almost every system of equations:

P(o) = II(diag(I Ale)), P ( ,+ , = II(diag(lA116(e<m))l)), m = 1, 2 ....

This is not suggested as a practical algorithm though because poorly scaled equations can usually be accurately solved by doing iteratwe improvement.

A more useful application of Theorem 4.4 is the diagnosis of fll scaling, for

Scaling for Numertcal Stabihty in Gaussian Elimination

max IA1121 oR (A, 2) = min IAI t21

Is an easily computable measure of how badly scaled the rows are. Remark. The second remark after Theorem 4.1 implies that

(I D-lrl), --< x(n)u max(I O-lAI I2 I)1- 3_<:

This refined bound suggests that

511

(IAI I-~1)~ max ,,, (IA1121),

would be a better a posteriori measure of the possible effect of ill scahng. The quantity on (A, 2) Is not very satisfactory for theoretical purposes because 2 depends

on the arithmetic used in the computation. We prefer to use OR (A, x) for the theory. For Hamming's example, Ihl Ixl = (3 + 3•, 6•, 4E) r and OR (A, x) = (3/4e) + (3/4). Near-op-

3 + 3 • ] D-ab = 6

2

and oR (A, x) -~ 2/• + 2. Near-optimal row

- - • 0

0 D_lb = 0 '

1

timal row scaling for this problem is given by

D-1A = 2/E 2 k l / • 2 -

For Gear's example, l a l Ixl = (2/• + 2, l, 1, 2) scahng is given by

c 0 D - 1 A ~" 0 •

0 0

One may wonder about the effect of scaling strategies such as row equilibration. Van der Sluis [23, p. 80] gives an example showing "that it Is quite possible ... that there exists no bound depending on n only for the ratios of the errors after and before equilibration." He goes on to describe a cautious equilibration scheme that never worsens the situation at the expense of possibly not improving a. An adaptation of this scheme to our theory ~s to choose

a~ d, = m~m max I--I ,

a~,kj

which has the effect of leaving no row of A strictly dominated by any other row ofA. Note that since d, _< 1, we have minlD-~A ] Ixl -> minlA I Ixl. Furthermore,

(IAI Ixt), = ~ n X la'±la,t la, ll Ix, I

-<minmaxl~lXla , , l lx t l , ,

_< d, max [A[ Ixl,

whence max ID-~Allx[ _< max IAI Ixl. Therefore

on(D_lA, x ) = m a x l D - l A l l x [ max IAI Ixl min ID-~AI Ixl -< min Ihl )xl - oR (A, x),

so that the scaling of the problem is not made worse by our choice of D. The theorem that follows gives bounds on the backward error that in the limit u ~ 0

depend only on how ill scaled the problem is and not on how ill condiUoned it is. First we need a lemma.

512 R . D . SKEEL

LEMMA 4.5 Under the hypotheses of Theorem 4.1

×(n)u II IO-lZl Ixl II IID-lrll-< ! - x(n)u Cond(A-lD)

provided that the denominator is positive. PROOF. Applying Theorem 4.1 gives

IID-Irll-< x(n)ull ID-IA I Ix - A-lr[ II -< x(n)ull IO-lhl Ixl II + x(n)u Cond(h-lO)llO-lrll • Q.E.D.

THEOREM 4.6. Let IAI Ixl > 0. Gausstan ehminatwn with column pivoting gives

x(n)uos(D-1A, x) 1 - x(n)u Cond(A-ID)(on(D-1A, x) + 1)

provided that the denominator is positive. PROOF. We have ID-Irl _< IID-lrlle. The given bound on ~/follows from Theorem 3.2

If we show that II D-lr[le is bounded by the appropriate scalar multiple of ID-IA I I~ I. From Lemma 4.5 we have

IlD-lrll e -< x(n)ull ID-1AI Ixl lie + x(n)u Cond(A-ID)llD-lrlle.

By definition

II ID-1A I Ixl lie-< on(o- la , x)ID-1AI Ixl.

Since x -- ~ - A-lr, we have that

IO-lal Ixl-< ID-1AI I~1 + Cond(A-IO)llO-lrll e.

Combining these last three inequalities and solving for liD-Idle yields the appropriate bound. Q.E.D.

Although we are unable to prove that there is always some ordering of the rows for which Gaussian elimination is stable, we can show that this is true asymptotically as u---> 0.

THEOREM 4.7. For any problem such that IAI Ixl > 0 there is some ordering of the rows for which Gausstan elimination is asymptotically stable.

PROOF. Using Theorem 4.6 with D = diag(IAI Ixl) gives

x(n)u 1 - 2x(n)u max (IAI Ih-ll IAI Ix l / IAI Ix l )

for small enough u. Hence for

u <_ if(n) = min I h l l x l 4x(n) IAI IA-II IAI Ixl

we have

TI --< 2x(n)u. Q.E.D.

4.2 SCALING FOR ACCURACY. By "accuracy" we mean that the computed solution is as accurate (as defined by our norm) as the solution of a slightly different problem. More precisely, we might require that

I1~ - xll Ilxll

_< K(n)u Cond(A, x) + O(u2),

this being the bound for the change m the solution due to relative changes in the data of not more than K(n)u. Here K(n) is some constant independent of the problem data. Obviously this Is a weaker requirement than stability since it is implied by stability. This concept is informally described as "essential numerical stability" by Peters and Wilkinson

Scahng for Numerical Stability in Gaussian Elimination 513

[ ! 8]. A more formal definition of this concept appears in a recent paper by Jankowski and Wo~niakowski [10].

The following theorem gives a good bound for the "forward" error in terms of a diagonal matrix D satisfying (4.5).

THEOREM 4.8. The error

PROOf. We have

x < X('L)U-IIA-'DI_I I_1 ID-~A I L l II I1~ - - 1 - x(n)u Cond(A-~O)

II.~ - xll = llA-'(-r)ll IIA-~DII llD-'rll,

and the theorem follows from Lemma 4.5. Q.E.D. Ignoring higher-order terms in u, let us determine the choice of D that minimizes

the error bound. Since scalar multiples of D are also optimal, we normalize so that II ID-XAI Ixl II ffi 1; that is, [d,,[ >_ ([AI Ixl), with equality for at least one i. It remains to minimize [[A-ID[[ subject to these constraints. Obviously, D should be chosen as small as possible; that is, D = diag([A [ Ix D. In this case the error

I1~ - xll-< x(n)u II Ia - l l IAI Ixl II + O(u2), and so the solution satisfies our accuracy requirement. Thus

IIa-'ll II Ihl Ixl II s(h, x) ffi II IA-'I IAI Ixl II

is a measure of the possible effect on the "forward" error of how poorly the equations happen to be scaled. For Hamming's example this quantity is %7 E -~ + O(1); and for Gear's example it is E -t + O(1) (which probably overestimates the effect on the relative error because the reduced systems obtained by any reasonable approximation to column pivoting are not ill conditioned).

However, for the 2 × 2 example of Section 1, s (A, x) = 2 even though a(A, x) = +oo. A near-optimal scaling is given by

where ~ >> 1. We note that s(A, x) ffi 1 + 1/~, and so there is a wide range of reasonably good solutions to the problem of scaling for accuracy. Hence the particular scaling determined by minimizing the error bound may be sensitive to the choice of norm. For other norms we can obtain error bounds similar to those of Theorem 4.8 because of the equivalence of norms. The most convenient choices are the 1~ and 1~ norms, which are the extreme cases of the Holder norms. Having done the analysis for the 1~ norm, we consider the minimization of IIZ-'Olll II I D-~AI Ixl II1. In this case we normalize so that IIA-~DII~ ffi 1; that is, Id??l ~-(eTIA-~I), with equality for at least one i. It remains to minimize II ID-~AI Ixl I1~ subject to these constraints. This time D -~ should be chosen as small as possible, namely, D -~ = diag(erlA-~ D, which yields a mimmum value of II IA-~I Ihl Ixl I1~. Clearly, Theorem 4.3 does not apply to "accuracy" in that there exists a scaling, and hence a pivoting strategy, depending only on the coeffioent matrix for which Gaussian elimination is "accurate."

We end this section by noting that the usual type of bound on the error has the form

I1~ - xll-< ×(n)ullA-~DII flO-lA II Ilxll + O(u2). For the max norm this is minimized by D -- diag(IA le), which is just row equilibration with the 1~ norm. According to this rule the example of Hamming is properly scaled, and yet it was shown in the preceding subsection that Gaussian elimination with pivoting would give poor results.

514 R.D. SKEEL

5. Gaussian Elimination with Row Pivoting

This secUon is similar to the previous section except that we examine the variant of Gaussian elimination in which the columns are interchanged in order to ensure that the pivot element is the largest m its row. The algorithm is assumed to do column scaling where the scale factors d~, d2 . . . . . dn are given as inputs to the algorithm. Again the selection of the pivot is assumed to be done exactly so that

(k~ (k) lake d,I-< lakkdkl, J--> k + 1. (5.1)

An a priori error bound is given by the following theorem: THEOREM 5.1. Let the vector ~ be computed by Gaussian elimination with row ptvotmg

and column scaling where 13 = diag(al, d2 .... dn) ts the matrix of scale factors. Then

Irl-< ~(n)ulADlel[D-~ll

for arbarary D = diag(dl, d2 . . . . . dn) satisfying (5.1) where

~(n) = [27.2 "-2 - 5n - 7]e ~"~'.

PROOf. A rather lengthy proof is given m Appendix B. Here we briefly indicate how one might obtain a weaker result more qmckly. The standard result for column pwotmg can also be proved for row pivoting by noting that

t~k uk l / U k k

And again the result generalizes to include row and column scahng so that II Di -laa D211 -< I~(n)ulID?IAD211. Since D1 has no effect on the algorithm, it may be chosen arbitrarily, which leads to IAAD21e -< l~(n)[ADzle. The theorem follows from Irl_< IAAD2IelID~'~II. Q.E.D.

Remark 1. It is believed that the constant ~ n ) m this bound can be replaced by a smaller constant.

Remark 2. It is actually possible to show that Irl -< IADIUBID-~I for some upper triangular matrix Us satisfying II UBU -< ~(n)u.

Remark 3. From Theorem 5.1 it follows that there exists AA such that (A + AA).~ = b with IAADle _< ~(n)ulADle; simply define

AA = rllD-~ll-~e~ diag(sgn(O-l.~))D -1

where I is the index of the largest component of I D - ~ I. 5.1 SCALING VOR NUMERICAL STABIHTY. The following theorem indicates how the

columns should be scaled so that Gaussian elimination with row pivoting is stable. The assumption is made that the columns of A corresponding to vanishing components of .~ were chosen as pivots as late as possible.

THEOREM 5.2. Let J ffi {i: (IAI I~1), = 0} and let J = {j: .~j = 0}. Ifa~k~ ~ = 0 for a l l j > k such that k E J a n d j q~ J , then the backward error

(IADole), ~1 -< ~(n)u max IID-~ll

, ,~ (IAI I~1),

where Do is D with the jth ¢hagonai entries set to O for j E ~ . PROOf. Fix D, and let Da = diag (d~, d2 ° . . . . . den) be defined by

i f j E J , d ' = [0d,d' otherwise.

First we show that the diagonal entries of Do satisfy (5.1) for 0 < 0 _< 1. This is clearly true for j, k E f and for j, k ¢ ~ . Also, i f j ¢ J a n d k ~ ~ff, then (5.1) follows from a~ k} = 0.

(k) (k) ~{k)dO Finally, i f j ~ ~ and k ~ J , then ,,~,-~k~0,,, --< la~, d,I <- la~d~l = "kk"~ • Therefore, we may apply Theorem 5.1 with D = Do to get

Scaling f o r Numerical Stability in Gaussian Elimination 515

Irl _< 22(n)ulADolellD-a~ll .

Letting 0 ---> 0 yields

Irl-< 22(n)ulADo{ellD-l~ll. (5.2)

Let i ~ J . F o r j ¢ J we have from (IAI I~1), = 0 that av = 0, and f o r j E J we have that d: ° = 0. Hence, (IADole), = 0, and so r, = 0. This and (5.2) imply

(IADole), Irl-< 2(n)u max IID-'~II IAI I~1-

, ~ (Ihl I~1),

Applying Theorem 3.2 completes the proof. Q.E.D. COROLLARY. The backward error

max I D- iS I ,1 - 2 (n)u

min:¢: (I O-l-~ I):"

PROOF. We have

1 IDoel ~- I~1 max Q.E.D. : ¢ : (1D-i~[):"

A choice of D which minimizes the bound on the backward error is

d, = .~z;

that is, we scale by multiplying the ith column by the ith component of the computed solution. In the unlikely event of zero scale factors, one should use instead extremely small nonzero numbers, so small that the corresponding columns are certain not to be selected until necessary. Again these weights are not known at the time when scaling is performed. The main value of this theorem is that it gives an easily computable measure of column ill scaling:

IA lell~ll oc(A, ~) = max IAI I~-----T"

For theoretical purposes we would prefer to use oc(A, x). For Hamming's example, oc(A, x) = (l/3e) + (2/3). Near-optimal row and column scaling, whmh would be appropriate for complete pivoting, is given by

D-{ 1 AD2 = 2 , D-{'b = 6 • 2 - 1 2

For Gear's example, oe(A, x) = I/E. Near-opumal row and column sealing is given by

Di-aAD2 = 0 1 ' Drlb = "

0 0

LEMMA 5.3. Under the hypotheses o f Theorem 5.1

IID-'xII [ID-~'~[[-< l - ~(n)u Cond(An)

provtded that the denominator is posittve. PROOF. Applying Theorem 5.1 gives

IID-*~U = IID-*(x - A-*r)ll --< IIO-*xll + 2(n)u Cond(AO)IIO-~ll . Q.E.D.

516 R.D. SKEEL

THEOREM 5.4. Let 1411xl > 0. Gausstan elimination with row pivotmg gzves

~(n)uoc(AD, D-Ix)

1 - ~(n)u Cond(AD) (oc(AD, D-Ix) + 1)

provided that the denommator is posuive. PROOF. From Theorem 5.1 we have Irl -< ~(n)u IADlelID-~II. The given bound on ~1

follows from Theorem 3.2 if we show that IADlelID-~II is bounded by the appropriate scalar multiple of I ADII D - ~ I. From Lemma 5.3 we have

IADlelID-I.ill _< IADlelID-Ixll + ~(n)u Cond(AD)lADlellD-l.ill By definition

IADlellD-'xll -< o,AAD, D-'x)lADI ID-'xl. Since x = .f - A- 'r , we have that

ID-IxI ~ ID-lxl -~- £(n)u Cond(AD)ellD-l.fll.

Combining these last three inequalities and solving for I A Dlell D-iS II yields the appropriate bound. Q.E.D.

THEOREM 5.5. For any problem such that Ixl > 0 there ts some ordering o f the columns for which Gaussian elimination is asymptotically stable.

PROOF. Using Theorem 5.4 with D = diag(lxl) gives

~(n)u vl< - 1 - 2~(n)u max(lA-11 la l Ixl / Ixl)

for small enough u. Hence for

u ~ ti(n) = min

we have

Ixl 4~(n) IA-'I IAI Ixl

7/<_ 2~(n)u. Q.E.D.

Recall that the stability threshold in the case of Gaussian elimination with optimal row ordering was

1.411xl a(n) = min 4~(n) IAI IA-q IAI ixr

It is easy to show that this is larger than the stability threshold for optimal column ordering. This may be a slight indication that column pivoting is superior to row pivoting.

5.2 SCAUNG FOR ACCURACY. We conclude by studying the effect of scaling on a good "forward" error bound,

THEOREM 5.6. The error

I~ - x l - <

PROOF. We have

X(n)ulA-ll IADlellD-Ixll l - ~(n)u Cond(AD)

I~ - xl = IA-'(-r)l -< ~(n)ulA-11 IADlelID-'~II

and the theorem follows from Lemma 5.3. Q.E.D. I f higher-order terms in u are ignored, the bound on the error is minimized by choosing

D = diag(]xl). Thus for row pivoting,

II IA-'I IAI II Ilxll II IA-'I IAI Ixl II

Scaling for Numerical Stability in Gausstan Elimination 517

is a measure of the possible effect of ill scaling on the "forward" error. For a similar bound on I1~ - gill In terms of the Ix norm, the optimal choice is D -~ ffi

dlag(erlA-1l lAD, which depends only on the coefficient matrix.

6. Practical Implications

The comments that follow are suggested by the error analysis, but their usefulness remains to be established.

By means of examples it has been shown that Gaussian elimination with (partial or complete) pivoting does not generally provide all the accuracy that the data deserve, or even a fixed fraction of that accuracy. Hamming [9, p. 121] states:

It is reasonable to ask how typical these examples are and how often in the past the pwotmg method has created the ill condmomng that was reported to occur by some hbrary routines The answers are not known at this time, all that is claimed is that textbooks and library descnptmns rarely, i f ever, mention this possibdlty (though it is apparently known in the folklore)

And so it seems that there have been practical instances where the pivoting method has performed poorly. Perhaps Gaussian elimination without iterative improvement should be regarded as a "quick and dirty" way to solve general linear equations.

The computation of the backward error is one reliable test for deciding whether or not the solution of a linear system is "reasonably accurate." The test can be made quite efficient by accumulating r and IAI I~[ + Ibl at the same time. I f the test is failed, then in most cases the use of iterative improvement would result in a solution that passes the test; for it is shown by Skeel [20] that a single iteration in single precision is enough to make Gaussian elimination asymptotically stable. One could, of course, forgo the backward error computation and just do lterative improvement until "convergence." But such a procedure may not be completely reliable since it has not been rigorously proved that "convergence" implies a reasonably accurate solution. Stewart [21, p. 205] mentions "the possibility that, with a violently ill-conditioned matrix, the iteration may appear to converge to a false solution."

The success of the pivoting method depends upon a reasonable scaling of the equatmns, which is at best guesswork unless one has some knowledge about the sines of the solution components. I f c = Ix I, then

(i) for row pivoting one should scale the system to get (DiqA)x = (Di4b) where D1 ffi diag(IA tc).

(ii) for complete pivoting one should scale the system to get (Di4AD2)(Dglx) = (DgIb) where D1 = diag(I A Ic) and D2 = diag(c).

It may be worthwhile to allow users of a linear equation solver to provide an estimate of the solution, particularly if separate subroutine parameters are used for the solution and the right-hand side. For simple use of the program, an estimate of all ones could be used.

Appendix A. Error Bounds for Column Pivoting

For any n X n matrix C = (c,j), let ~j = cJd, . Also, let ~o = 1 + u. LEMMA A I . W e h a v e

and Im,kl --< ~ ld, /dk l , i > ~,

k-1 Idl~l _< w k+' £ (2~o)~-'-qa01 + wk-ll&jI, i,j_> k.

t-1

PROOF. Equation (4.1) implies

Im,kl < . I . ( k ) , _ ( k ) , - - WlUtk / ~ k k I, ! ~ k ,

and because of row pivoting (see (4.5)) we get the first inequality of the lemma. Equation (4.2) implies

l a~+ ' l _< ,~la~k> I + (1 + 2u)lm,ka~q, i , j _> k + 1,

5 1 8 R . D . SKEEL

and therefore

la~:+'l _< ~la~:)l + ,~(l + 2u)la~f'I, i , j --> k + 1.

The second inequality of the lemma follows from this by induction on k. Q.E.D. LEMMA A2. The matrices L -- (l,~) and U ffi (u,:) satisfy

L U f A + E ~ u + E ~2~+ ... + E ~-~)

where the matrices E ~k) have elements E}~ ~ which satisfy

f,o-~ula}:)l + (2 + 3u)~o-2ulm,~a~k)[ for i , j > k, I~g~)l _</ . , - lula~)l f o r t > j = k,

t o otherwise,

regardless of the pivoting strategy. PRoof. Define the elements of E Ik~ by

f~+~) _ _~k) ~ ",~"~ ~k)~ for i , j > k, / -- . ~) ~ ~) for i > j = k,

otherwise.

By separately considering the cases i _< j and i > j, it is straightforward to show that the elements ~ ) o f E ~ satisfy

n--t

Z E~ k) ffi ~ l ,kukj- aq, k ~ l k ~ l

which establishes the equality of the lemma. Let k _< n - 1 be fixed. Write (4.1) as (k) (k) m~f f i (a ,~ /akk) ( l +8,~), i>_k + 1,

and (4.2) as

,k+~) = (a~k, _ m,ka~'( l + 6,:))(1 + 6',~), a,~

where the 8's are relauve roundoff errors. Then

f a ( k ) f f _ (k) m,kaky (6 v + 6, a + 8,:~:) E~k) I -- tj ~t j

and the lemma follows from the bound u/(1 + u) on the 8's. L~MMA A3. The matrices L and U satisfy

L U f f i A + E

wtth

t , j _ > k + 1,

for i , j > k, for i > j = k , otherwise,

Q.E.D.

II ID-~Elzll ~. ( 3"2~-~ - 3)~o~-~ull I D-~A Izll

for arbitrary z >_ O. PROOF. Let E ffi E (1) + E t2) + ... + E tn-1) where the E tk) are given by Lemma A2.

Substituting the bound on m,k of Lemma AI into the bound on the elements el) ~ we get

which, in fact, is valid for all t, j . This implies

I~k'lz, -< 3uU IO-'a~k)lz U J

for arbitrary z _> 0. From Lemma AI it immediately follows that II ID-'A¢k)lzl[ --< 2k-%2'-~11 ID-IA Izll; and so we have

IIID-1E~k>IzU -< 3"2k-%2n-'ull I D-~A Izll,

Scahngfor Numerical Stability m Gaussmn Eliminatton

from which the lemma follows. Q.E.D. LEMMA A4. We have

II,,l <- wld,/d,I, t > j,

and

519

la,~l -<,Y'-I ~ [2,-z-111&~l, i_<j. l= l

PROOF. Follows from Lemma AI because l,~ m,j and u,~ = ~') = a,j. Q.E.D. LEMMA A5. The vector ~ computed by (4.4) satisfies (U + 6U)~ = y for some upper

triangular matrix 8U such that

lau,,I -< ug(i,j)w'~-'lu,,I and [u,, + du,,I -< w"-'+'lu,,I

where

n - j + 2 , j e t + 2 , n - j + l , j = i + 1,

g(i , j )= 2, j = i < _ n - 1, 1, j = i n,

regardless of the pivoting strategy. PROOF. From (4.4) we have that

UuXt • I- ( . . . (u,.,+x X ~,+1 + u,.,+2 X £,+2)'" + u,n X .~n) = y~, i<_ n - 1 ,

(1 + ~,)(1 + 8:)

and

U n n X n

1 + ~ - Y " '

where 8, and 8~ are relative roundoff errors due to subtraction and division, respectively. By the usual type of analysis we can obtain the bounds

f ( w "-~+~ - l)lu,A, j>-- I + 2, J(,o n-'+i - l ) lu,+l , j = i + 1,

I~u"l -< l ( w 2 - l)lu,,I, j = i < _ n - - 1, k ( o ~ - l ) l u o l , j = i n,

from which the lemma follows. Q.E.D. LEMMA A6. The matrix ~ U of Lemma A 5 satisfies

II ID-1L6UIzll-< [ 5.2"-2 - 21,o%11 IO-IZ Izll

for arbitrary z _> O. PROOF. From Lemma A4 it follows that

(ID-'L~UIz), <- ~ E IdF'l,,dhl I~ak, lz, j k

_< w E ~ I~a~,lz,. j k

Applying Lemma A5 first and then Lemma A4 gives

(ID-1LSUIz), ~_ u ~ ~ g(k,j)w"-k+~lak, I Z, j k n j k

_< u Z Z g(~,J) ,o"÷~ Z [2~-~-ql,i01z~ J--1 k--1 l~ l

n

-< w2"u N Z Z l'2~-'-qg(k,j)l~olz~ - l ~ l .I--I k - l

520 R.D. SKEEL

After much manipulation it turns out that

and therefore

]

max ~ [2k-t-qg(k, j )= [5.2~-l-21, l~l~_n k--I

(ID-'L6UIz), -< , ,~u ~ [5.2"-l-2j(ID-IA Iz),, l--1

from which the lemma follows. Q.E.D. LEMMA A7. The rectory computed by (4.3) satisfies

(L + 6L)y = b

for some lower triangular matrix 6L whose elements 81,j satisfy n--j ( j ) ( j ) 16l,,I < min{n - j + 1, n - l )w ula v / % l, i _> j,

regardless of the ptvoting strategy. PROOF. We have from (4.3) that

Y' = b,, i->2, (...(l,, )< y] .i- 1,2 ><y2)... + l,,,-~ x y,-,) + 1 + 6,

where 8, is a relative roundoff error. By the usual type of analysis we get that

and

Im,~l -< ((1 + w-iu) '-x - l)ll, d,

Im~,l ~- ((1 + o.~-lu) t -g+l - l)ll,jI,

Im,,I -< ull,,I. The lemma follows from the inequahties

and

L E M M A A 8 .

i -->2,

i > j - > 2 ,

(1 + o~-lu) k - 1 <_ w-luk(l + w-lu)k-] < kwk-2 u

II,A -< rain(,, , '-~ (J) {J) ~o } l a , ~ / % 1. Q.E.D.

The matrices 6U and 6L of Lemmas A5 and A7 satisfy

II ID-~SL( U + 6 V ) l z l l <- (2 n÷~ - n - 3 ) J " u l l ID-~A Izll

for arbttrary z _> O. PROOF. Using the bound of Lemma A7 and the row pivoting inequality (4.5), we get

Im,A-< min{n - j + I, n - l)~on-~uld,/djI, i_>j.

Hence

(ID-',SL(U + 6U)lz), -< u ~ min{n - j + 1, n - l}o0"-'(ID-~(U + 6U)lz)l . l

It immediately follows from Lemma A5 that

(ID-~(U + 6U)lz), -< ,on-"(ID-~UIz)~

and from Lemma A4 that

(ID-~(U + 6U)lz)~ -< w'~+'2~-~ll ]D-1A Izll. Therefore,

n

(ID-X,~L(U + 6U)lz) , <- ~o2~u ~ min{n - j + 1, n - I )2 ' -q l I D - h Iz II, J--1

Scaling for Numerical Stability in Gaussian Elimination 521

from which the lemma follows. Q.E.D. THEORE~,I 4.1. Let the vector ~ be computed by Gaussian elimination with column pivoting

and row scaling where 15 = diag(db a2 . . . . . dn) ts the matrix o f reciprocal scale factors. Then

IID-'rtl-< x(n)ull ID- 'A I I~1 II

for arburary D = diag(dl, d2, ..., dn) satisfying (4.5) where

x(n) ffi [19.2 "-2 - n - 8]e 2"".

PROOF. From Lemmas A3, A6, and A8 we have the bounds

I Io- 'g~l l -< (3.2 "-~ - 3 ) J " - ' u l l ID-~AI I.~1 II,

IID-'L8U~II <- [5 .2 "-~ - 2 ]J"u t l ID-~AI I~1 II,

a n d

IID-~SL(U + 8U)~II "~ (2 "÷~ - n - 3)J"ul l ID-'AI I~1 II.

The theorem follows from the equation

(A + E + 8LU + (L + 8L)SU)~ = b. Q.E.D.

THEOREM 4.2. There exists a problem Ax = b and a floating-point arithmetic ( +, =, x , ]) such that the solution ~ computed by Gaussian elimination with partial or complete pivoting satisfies

Ilrl[ > [19.2 "-~ - n - 8]u + O(u2). II IAI I~1 I I -

Therefore, the bound of Theorem 4.1 is the best possible bound up to first-order terms in u.

and

PROOF. Obvious for n = 1. Assume n ~_ 2. Let

A = : M

- M . . . . M

i + (7.2k-1 - k - 8)u, k _ < n - 2 , bk= + ( 2 " + l - n - 7 ) u , k = n - 1,

+ ( 1 9 . 2 n - 2 - n - 8 ) u , k = n .

If M ts large enough, then there are no interchanges even with complete pivoting. We have

mlg = m = ( - M ) / M , i > j,

a~ )= 1,

atk) = _~k-~ = ~ ~tk-u i_~ k ~ 2. ,, ,in m Uk-l ,n,

Suppose that all these floating-point operations mcrease the magnitude of the result by a factor (1 + 2u)/(l + u). Then

m = - ( l + u ) + O(u 2)

and ~ . ~ ( k - l ) -,."~kJ = (1 + u)al*.-" + ( ! + J - , , , ~ - l , . + O(u~).

By reduction on k, it follows that

alk) 2 k-1 + 2k(k l)u + O(u2), i ~ k,

522

and hence

W e have

where

Then

and

R. D. SKEEL

ukn = 2 k-I + 2k(k -- l )u + O(u2).

y l ~ bl

yk = bk 'Z Sk-~, k_> 2,

Sk ---- ( . . . (m X y l g m X y2) . . . "3r m X yk).

S l = m f< bl

Sk = Sk_l g m ~ (bk -~ Sk_l), 2_< k_< n - 1.

Suppose that all these f loat ing-point operat ions reduce the magni tude o f the result by a factor 1/(1 + u). Hence

& = -b~ + O(u 2)

and

Sk = (2 -- 3u)Sk-1 -- (1 -- 2u)bk + O(u2),

By induct ion on k, it follows that

and

Also we have

F r o m this we get

We have

2 ~ k ~ n - l .

S k = 1 - - 2 k - 2 k + l ( k - 4 ) u - ( k + 9 ) u + O ( u 2 ) , k _ < n - 2

S~-i = 1 - 2 n-I - - 2n-2(4n - 19)u - (n + 8)u + O(u2).

yl = 1 - 2u,

yk = (bk -- Sk-1)(l -- u + O(u2)), k _> 2.

yk = 2 k-1 + 2k(k - 2)u + O(uZ), k _< n - 2,

yn-1 = 2 '~-2 + 2n-2(2n - 5)u + O(u2),

yn = 2 ~-1 + 2~-I(2n - l)u + O(u2).

.~,~ = y , J u ~ ,

.~k = ( yk .Z (0 -7- ukn ~ .~n))]M, k _ < n - 2 .

Suppose that all these f loat ing-point operat ions reduce the magni tude o f the result by a factor 1/(1 + u). T h e n

~,~ = (y,~/unn)(l - u) = 1 + O(u2),

Un-- l , n X Xn. = Un-l,n(l -- U) = 2 n-2 + 2n-2(2n -- 5)U + O(U2),

~n-1 = O(U2),

0 + Uk,n X ~,~ = Uk,n(l -- 2U) = 2 k-1 + 2k(k -- 2)U + O(u2), k __< n - 2,

Scaling f o r Numerical Stability in Gausstan Elimination

and

Hence

2~ = O(u~), k_~ n - 2.

(b - AA)~ = b~ - 1 + O(u 2) = (19.2 ~ - 2 - n - 8 ) u + O ( u ~ ) . Q.E.D.

Appendix B. Error Bounds f o r Row Ptvoting

For any matrix C = (cv) let G: = cvdj. Also, let ~o = 1 + u. LnMMA B 1. We have

- (k) [m,ka~'[-< oola,k 1, i > k, j _> k,

k - 1

iGk, i _~ ~k+, ~ (2o0k-~-, la. i + o : - ' l G I , i , j >_ k 1--1

PROOF.

and

Equation (4.1) implies

(k) ,I ~ ( k ) ~ ( k ) . l / ~ (k) ] ]m,kakj dj[ _< ~ l , , k "ks ,:/--k~ J, i > k , j _ > k,

and because of (4.5) we get

[ (k) (k) m,kakj 41 -< oo[a,k &[, i > k, j .~ k,

which proves the first inequality. Equation (4.2) implies

lair+'[ _< ,~[a~:)[ + (1 + 2u)[m,ha~)[, t , j _~ k + 1,

and therefore

IGk+'l < o~la~,k~l + ,~(l + 2u) :lk) - - - , k [ , i , J _ > k + l .

The second inequality of the theorem follows from this by inductton on k. LEMMA B2. The matrices L and U satisfy

L U = A + E

with

Q.E.D.

PROOF. Lemma A2. Substituting the elements _~k) ...~ % , ~ get

523

]EDle --< [7.2 ~-2 - n - 2JJ~u]ADle .

Assume n >_ 2. Let E = E ") + E ~2) + ... + E ~-~1 where the E ~k~ are given by into the bound on the on m,k of Lemma B I

for i, j > k, for i > j = k, otherwise.

bound

rula~k,i + 2~ula~l J ut 6~k) l

~('f '[ ~ t O ' ,k ,

This implies

_ ~ Id tk)l i > k, Z Ig~f)l < (2n - 2 k + l)~ulalg)l + u , ,, ,, j j-k+l

and from Lemma B 1 it follows that

2 Ig~k~[ --~ (2n -- 2k + l ) J k u 2k-~-'la.I + la~l : t l -1

k--1 n

+ (n - k ) J k - ~ u ~ 2 k - l - q a , I + wk-au l=l J ~ k ÷ l

I,i~l

524 R. D . SKEEL

I 2(2n - l)w2u~ I,i~,l if k ffi 1,

(3n - 3k + l ) J k u 2 k-2 ~ I,i,,I if k >_ 2. J

The lemma follows bec~tuse

and

n - I

( 2 n - 1) + Y, (3n - 3k + 1)2 k-2 = 7.2 " - 2 - n - 2. k - 2

LEMMA B3. We have

I I,~a~A _< ,o la~ I, J

l= l

Q.E.D.

la,A-< la.I, i<--j •

l~toov. The first two inequalities follow immediately from Lemma B 1, and the third inequality is a consequence of column pivoting. Q.E.D.

LEMMA B4. The matrix 6U o f Lemma A5 satisfies

[L6UDIe _< (2 "+' - n - 2 ) J " u l A D l e .

PROOF. Applying the inequality l a,A -< l a,,I of Lemma B3 and the bounds of Lemma A5, we get for i < n

~lSa~,l _< u ~ g(i, j )oY'- ' la , I J g--t-

( n - i + 2 ) ( , , - i + l ) < wn-'ula.I. - 2

Therefore,

([LSUDIe), = ~ l/v[ E lSa, kl j k

_ u 2 ( n - J + Z ) ( n - j + l ) _ _ , 2 ,o" JI t,~u~A,

and so using Lemma B3 gives

v ( " - J + 2 ) ( . - j + 1) (, (ILSUDIe), <_ ~ 2n lu

J--I ~ 2 I--1 ~

z, (" ( n - j + 2 ) ( n - j + 1)r2,_~l v <_ 6j2nu

y - I 2 l - 1 "%

from which the lemma follows. Q.E.D. L[MMA BS. The matrices 8 U and 8L of Lemmas A5 and A 7 satisfy

16L(U + 6U)Dle <- 3(2" - n - l)w2"u[AO]e.

PROOF. Applying the inequality [tiv[ _ [t~,[ of Lemma B3 and the bounds of Lemma A5, we get

n

Y. la~ + ~a~l ~ Y, ,o"-'+'la,,I l J = l

~_ (n - i + l)w"-'+'lti,].

Therefore applying the bounds of Lemma A7 gwes

Scal ing f o r Numer ica l S tabdi ty in Gaussian Elimination 525

(18L(U + 8U)Dle), = Z 181,,I ~ la,h + 8tbkl j k

n--y (y) (J) < _ E m i n { n - j + l , n - - l}0~ la,, / % I ( n - J + l)w"-~+ll~,,[, J

and so using Lemma B3 gives

(I~L(U + 8U)Dle), -~ J " u min{n - j + 1, n - l}(n - j + 1) 2 [2~-l-tl l&tl J--1 l l l

/t n

--~ °~2~u 2 m i n { n - j + 1, n - l } ( n - j + 1)[2 ~-2] 2 I,~,,1, J z l I i I

from which the theorem follows. Q.E.D THEOREM 5.1. Let the vector ~ be computed by Gaussian elimination with row pivoting

and column scaling where 15 = dlag(dl, d2 . . . . . d,) is the matrix of scale factors. Then

[rl -< ~(n)ulAOlell o - ' Yll

f o r arbttrary D = dtag(dl, d2 . . . . . d#) satisfying (5.1) where

22(n) = [27.2 "-2 - 5n - 7Je 2#~.

PROOF. The theorem follows from the bounds of Lemmas B2, B4, and B5 and from the equality

(A + E + ~ L U + (L + BL)SU).~ = b. Q E . D .

N o t e a d d e d in Proof : T h e o r e m 3.4 a p p e a r s in a r e cen t p a p e r o f R. S c h a b a c k , " E i n e r u n d u n g s g e n a u e F o r m e l zu r m a s c h i n e l l e n B e r e c h n u n g d e r P r a g e r - O e t t l i - S c h r a n k e , "

Comptng. 20 (1978), 177-182. Also , r e l a t ed w o r k has b e e n d o n e by E. L i p p e r in h is P h . D . d i s s e r t a t m n " A m u l t i p l i c a t i ve p e r t u r b a t i o n a p p r o a c h to b a c k w a r d e r r o r ana lys i s fo r s y s t e m s o f l inea r a l g e b r a i c e q u a t i o n s , " S t evens Inst . o f T e c h n o l o g y , H o b o k e n , N. J., J an . 1977.

REFERENCES

I BAUER, F L On the definition of condition numbers and their relation to closed methods for solving linear systems lnformanon Processing, Proc Int Conf Inform Processing, Unesco Paris, June 1959, Butterworth, London, 1960, pp 109-110

2 BAUER, F L Optimally scaled matrices Numer Math 5 (1963), 78-87 3 BAUER, F L Genaulgkeltsfragen bel der Losung llnearer Gletchungssysteme ZAMM 46, 7 (Nov 1966), 409-

421 4 BAUER, F L Computational graphs and rounding error SlAM J Numer Anal 11, I (Mar 1974), 8'7-96 5 CURTIS, A R, AND REID, J K On the automatic scaling of matrices for Gausslan ehmmatton J lnst Math

Apphc 10, l (Aug 1972), 118-124 6 FORSYTHE, G E, AND MOLER, C B Computer Solution of LmearAlgebrmc Systems Prentice-Hall, Englewood

Cliffs, N J , 1967 7 FORSYTHE, G E, AND STRAYS, E G On best condmoned matrices Proc Amer Math Soc 6 (1955). 340-

345 8 GEAR, C W Numerical errors m sparse linear equations File F-75-885, Dept Compt Scl. U of Illinois at

Urbana-Champatgn, Urbana, Ill, May 1975 9 HAMMING, R W Introduction to Apphed Numerwal Analysts McGraw-Hill, New York, 1971

l0 JANKOWSKI, M, AND WO~NIAKOWSKI, M lterative refinement implies numerical stabdlty BIT 17 (1977), 303-31 I

I I KAHAN, W Numerical hnear algebra Canad Math Bull 9(1966), 757-801 12 MILLER. W Automatic a priori round-offanalysts, I Computing 10 (1972), 97-106 13 MILLER, W On the stability offimte numerical procedures Numer Math 19 (1972). 425-432 14 MILLER, W Computer search for numerical mstablhty J ACM 22, 4 (Oct 1975), 512-521 15 MILLER. W Roundoff analysis by direct comparison of two algorithms SIAM J Numer Anal 13, 3 (June

1976), 382-392 16 MILLER. W Roundoff analyses and sparse data Numer Math 29, l (1977), 37-43 17 OETTLI, W, AND PRAGER, W Compatibility of approximate solution of linear equaUons wRh given error

bounds for coefficients and right-hand sides Numer Math 6 (1964), 405-409

526 R . D . SKEEL

|8. PETERS, G., AND WILKINSON, J H. On the stabdtty of Gauss-Jordan elimination with pwotmg. Comm. ACM l& I (Jan. 1975), 20-24.

19. SHERMAN, A.H. Algorithms for sparse Gaussian elimination with partial pivoting Rep. R-76-817, Dept. Comput. Sci., U. of Illinois at Urbana-Champaign, Urbana, Ill., July 1976

20 SKEEL, R.D. Iterative refinement implies numerical stability for Gausslan elimination Manuscript, Dept. Comptr. Scl, U. ofllhnots at Urbana-Champaign, Urbana, IlL, July 1978 Submitted to a techmcal journal.

21. STEWART, G W Introductwn to Matrix Computations Academic Press, New York, 1973 22 VAN DER SLUIS, A Stability of solutions of linear algebraic systems Numer Math. 14 (1970), 246-25 ! 23 VAN DER SLUiS, A Condition, equthbration, and pivoting in linear algebraic systems Numer Math 15

(1970), 74-86 24 WILKINSON, J.H Rounding Errors m Algebraic Processes Prentice-Hall, Englewood Cliffs, N J , 1963

RECEIVED APRIL 1977; REVISED SEPTEMBER 1978

Journal of the AssoclaUon for Computing Machinery, Vol 26, No 3, July 1979

Scaling for Numerical Stability in Gaussian Elimination

Documents

Transcript of Scaling for Numerical Stability in Gaussian Elimination