    The Conjugate Gradient Method

    Tom Lyche

    University of Oslo


    Plan for the day

    The methodAlgorithm

    Implementation of test problems


    Derivation of the method


    The Conjugate gradient method

    Restricted to positive definite systems: Ax = b,A Rn,n positive definite.Generate {xk} by xk+1 = xk + kpk,pk is a vector, the search direction,

    k is a scalar determining the step length.

    In general we find the exact solution in at most niterations.

    For many problems the error becomes small after a fewiterations.

    Both a direct method and an iterative method.

    Rate of convergence depends on the square root of thecondition number

    The name of the game

    Conjugate means orthogonal; orthogonal gradients.But why gradients?

    Consider minimizing the quadratic function Q : Rn


    given by Q(x) := 12xTAx xTb.The minimum is obtained by setting the gradient equalto zero.

    Q(x) = Ax b = 0 linear system Ax = bFind the solution by solving r = bAx = 0.

    The sequence {xk} is such that {rk} := {bAxk} isorthogonal with respect to the usual inner product in Rn.

    The search directions are also orthogonal, but with

    respect to a different inner product.

    The algorithm

    Start with some x0. Set p0 = r0 = bAx0.For k = 0, 1, 2, . . .


    = xk

    + kp


    k= r

    Tk rk


    rk+1 = bAxk+1 = rk kApkpk+1 = rk+1 + kpk, k =



    k rk

    2 11 2

    [ x1x2 ] = [ 10 ]

    Start with x0 = 0.


    = r0 = b = [1, 0]T

    0 =rT0 r0

    pT0Ap0= 12 , x1 = x0 + 0p0 = [

    00 ] +

    12 [

    10 ] =



    r1 = r0 0Ap0 = [10 ]


    2 21

    = 0

    1/2, r


    1 r0 = 0

    0 =rT1 r1rT0 r0

    = 14 , p1 = r1 + 0p0 =


    + 14 [

    10 ] =



    1 = rT1 r1

    pT1Ap1= 23 ,

    x2 = x1 + 1p1 =


    + 23




    r2 = 0, exact solution.

    Exact method and iterative method

    Orthogonality of the residuals implies that xm is equal to the solutionx of Ax = b for some m n.For if xk = x for all k = 0, 1, . . . , n 1 then rk = 0 for

    k = 0, 1, . . . , n 1 is an orthogonal basis forRn

    . But then rn Rn

    isorthogonal to all vectors in Rn so rn = 0 and hence xn = x.

    So the conjugate gradient method finds the exact solution in at most

    n iterations.

    The convergence analysis shows that x xkA typically becomessmall quite rapidly and we can stop the iteration with k much smaller

    that n.

    It is this rapid convergence which makes the method interesting and

    in practice an iterative method.

    Conjugate Gradient Algorithm

    [Conjugate Gradient Iteration] The positive definite linear system Ax = b is

    solved by the conjugate gradient method. x is a starting vector for the iteration. The

    iteration is stopped when ||rk||2/||r0||2 tol or k > itmax. itm is the number ofiterations used.

    function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r=bAx ; p=r ; rho=r r ;rho0=rho ; f o r k=0: i tmax

    i f s q r t ( rho / rho0)

    A family of test problems

    We can test the methods on the Kronecker sum matrix

    A = C1I+IC2 =



    . . .




    cI bI

    bI cI bI

    . . . . . . . . .

    bI cI bI

    bI cI


    where C1 = tridiagm(a,c,a) and C2 = tridiagm(b,c,b).

    Positive definite if c > 0 and c |a| + |b|.

    m = 3, n = 9

    A =

    2c a 0 b 0 0 0 0 0

    a 2c a 0 b 0 0 0 0

    0 a 2c 0 0 b 0 0 0

    b 0 0 2c a 0 b 0 0

    0 b 0 a 2c a 0 b 0

    0 0 b 0 a 2c 0 0 b

    0 0 0 b 0 0 2c a 0

    0 0 0 0 b 0 a 2c a

    0 0 0 0 0 b 0 a 2c

    b = a = 1, c = 2: Poisson matrixb = a = 1/9, c = 5/18: Averaging matrix

    Averaging problem

    jk = 2c + 2a cos(jh) + 2b cos(kh), j ,k = 1, 2, . . . , m .a = b = 1/9, c = 5/18

    max =5


    + 4


    cos(h), min =5

    9 4



    cond2(A) =maxmin

    = 5+4 cos(h)54 cos(h) 9.

    2D formulation for test problems

    V= vec(x). R= vec(r), P = vec(p)Ax = b DV+ V E= h2F,D = tridiag(a,c,a)

    Rm,m, E= tridiag(b,c,b)


    vec(Ap) = DP+ PE

    [Testing Conjugate Gradient ] A = trid(a,c,a,m) Im + Im trid(b,c,b,m) Rm2,m2

    function [V , i t ]= cg te s t (m, a , b , c , t o l , i tmax )

    h=1/(m+1); R=hhones(m);D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;

    V=zeros (m,m) ; P=R; rho=sum(sum(R.R) ) ; rho0=rho ;f o r k=1: i tmax

    i f s q r t ( rho / rho0)

    The Averaging Problem

    n 2 500 10 000 40 000 1 000 000 4 000 000

    K 22 22 21 21 20

    Table 1: The number of iterations K for the averag-

    ing problem on a

    n n grid. x0 = 0 tol = 108

    Both the condition number and the required number of iterations are

    independent of the size of the problem

    The convergence is quite rapid.

    Poisson Problem

    jk = 2c + 2a cos(jh) + 2b cos(kh), j ,k = 1, 2, . . . , m .a = b = 1, c = 2max = 4 + 4 cos (h), min = 4


    cond2(A) =maxmin

    = 1+cos(h)1cos(h) = cond(T)2.

    cond2(A) = O(n).

    The Poisson problem

    n 2 500 10 000 40 000 160 000

    K 140 294 587 1168


    n 1.86 1.87 1.86 1.85

    Using CG in the form of Algorithm 8 with = 108 and x0 = 0 we list

    K, the required number of iterations and K/


    The results show that K is much smaller than n and appears to be

    proportional to


    This is the same speed as for SOR and we dont have to estimateany acceleration parameter!

    n is essentially the square root of the condition number of A.

    The work involved in each iteration is1. one matrix times vector (t = Ap),

    2. two inner products (pTt and rTr),

    3. three vector-plus-scalar-times-vector (x = x + ap,r = r at and p = r + (rho/rhos)p),

    The dominating part of the computation is statement 1.Note that for our test problems A only has O(5n) nonzeroelements. Therefore, taking advantage of the sparseness ofA we can compute t in O(n) flops. With such an

    implementation the total number of flops in one iteration isO(n).

    More Complexity

    How many flops do we need to solve the test problemsby the conjugate gradient method to within a giventolerance?

    Average problem. O(n) flops. Optimal for a problemwith n unknowns.

    Same as SOR and better than the fast method based

    on FFT.Discrete Poisson problem: O(n3/2) flops.

    same as SOR and fast method.

    Cholesky Algorithm: O(n2) flops both for averaging andPoisson.

    Analysis and Derivation of the Method

    Theorem 3 (Orthogonal Projection). LetSbe a subspace of a finitedimensional real or complex inner product space(V,F, , , ). To eachx Vthere is a unique vectorp Ssuch that

    xp, s = 0, for alls S. (1)



    x - p


    S S

    The Con u ate Gradient Method . 19/

    Best Approximation

    Theorem 4 (Best Approximation). LetSbe a subspace of a finitedimensional real or complex inner product space(V,F, , , ). Letx V, andp S. The following statements are equivalent



    ,s = 0,

    for allsS.

    2. x s > xp for alls Swiths = p.


    1, . . . ,vk)

    is an orthogonal basis for S then

    p =k




    vi. (2)

    Derivation of CG

    Ax = b, A Rn,n

    is pos. def., x, b Rn

    (x,y) := xTy, x,y Rn


    := xTAy = (x,Ay) = (Ax,y)

    xA = xTAxW0 = {0}, W1 = span{b}, W2 = span{b,Ab},Wk = span{b,Ab,A


    b, . . . ,Ak1

    b}W0 W1 W2 Wk dim(Wk)

    k, w




    xk Wk, xk x,w = 0 for all w Wkp0 = r0 := b, pj = rj


    rj ,pipi,pi

    pi, j = 1, . . . , k .

    Theorem 5. Suppose we apply the conjugate gradient method to apositive definite systemAx = b. Then theA-norms of the errors satisfy




    ||x x0||A 2


    + 1k

    , for k 0,where = cond2(A) = max/min is the 2-norm condition number of

    A.This theorem explains what we observed in the previoussection. Namely that the number of iterations is linked to

    , the square root of the condition number of A. Indeed,

    the following corollary gives an upper bound for the numberof iterations in terms of


    Corollary 6. If for some > 0 we havek 12 ln(

    2 ) then||x xk||A/||x x0||A .

    The Con u ate Gradient Method . 23/