A theoretical and computational investigationof a ... · GENERALIZED POLAK-RIBIERE • ALGORITHM...

378

Transcript of A theoretical and computational investigationof a ... · GENERALIZED POLAK-RIBIERE • ALGORITHM...

  • Loughborough UniversityInstitutional Repository

    A theoretical andcomputational investigation

    of a generalizedPolak-Ribiere algorithm forunconstrained optimization

    This item was submitted to Loughborough University's Institutional Repositoryby the/an author.

    Additional Information:

    A Doctoral Thesis. Submitted in partial fulfilment of the requirementsfor the award of Doctor of Philosophy of Loughborough University.

    Metadata Record: https://dspace.lboro.ac.uk/2134/13193

    Publisher: c K.M. Khoda

    Please cite the published version.

    https://dspace.lboro.ac.uk/2134/13193

  • This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository

    (https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.

    For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/

  • LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY

    LIBRARY AUTHOR/FILING TITLE !

    , i.1Ho])A \

  • A THEORETICAL AND COMPUTATIONAL

    INVESTIGATION OF A GENERALIZED POLAK-RIBIERE ALGORITHM

    FOR UNCONSTRAINED OPTIMIZATION

    by

    KHAN MONZOOR-E-KHODA

    A Doctoral Thesis

    Submitted in partial fulfilment of the requirements

    for the award of the degree of

    Doctor of Philosophy

    of the Loughborough University of Technology

    February, 1992

    Supervisor: Professor C. Storey

    Department of Mathematical Sciences

    @ by K. M. Khoda, 1992

  • Loughborough Unrverslty Of Technoi"oy library

    -- _o..kr...:.-.J..'i '-=------t

    , . ,- O%()()o"1~1

    w'1

  • This Thesis is dedicated to my Father and Mother

    as a token of my grateful appreciation

  • TABLE OF CONTENTS

    ACKNOWLEDGEMENTS vii

    SUMMARY OF THE THESIS viii

    CHAPTER 1 : INTRODUCTION 1

    1.1 General Nature of Optimization 1

    1.2 Unconstrained Optimization 2

    1.3 Scope and Organization of The Thesis 4

    CHAPTER 2 : MATHEMATICAL FOUNDATIONS 5

    2.1 Notation 5

    2.2 Background Material 7

    2.3 Gradient Methods of Minimization 7

    2.4 Line Search Strategies 12

    CHAPTER 3 : GENERALIZED POLAK-RIBIERE ALGORITHM 15

    3.1 Derivation of the Algorithm 16

    3.2 General Properties of the Algorithm 19

    3.3 Global Convergence Properties of the Algorithm 26

    3.4 Rate of Convergence 42

    3.5 Characteristic Behaviour and Basic Algorithm 53

    CHAPTER 4 : SOME MODIFICATIONS OF THE GPR

    ALGORITHM AND THEIR IMPLEMENTATIONS 58

    4.1 GPR Algorithm with Non-negative Beta 59

    IV

  • Table of Content.!

    4.2 GPR Algorithm with Powell Restart

    4.3 Shanno's Angle-test Restarting GPR Algorithm

    4.4 Efficiently Restarting GPR Algorithm

    4.5 Concluding Remarks

    65

    67

    71

    76

    CHAPTER 5 : MULTI-TERM RESTARTING GPR ALGORITHMS 77

    5.1 Beale Three-Term Restarting GPR Algorithm 78

    5.2 Nazareth Three-Term Restarting GPR Algorithm 85 5.3 Concluding Remarks 90

    CHAPTER 6 : EXTENSION OF THE GPR ALGORITHM 91

    6.1 Theoretical Basis 91

    6.2 Algorithm Construction 93

    6.3 Implementation and Basic Algorithm 97

    6.4 Concluding Remarks 100

    CHAPTER 7 : COMPUTATIONAL EXPERIMENTS 101

    7.1 Line Search Algorithm 101

    7.2 Test Problems 105

    7.3 Numerical Results 118

    7.4 Discussion 6f the Results 128

    7.5 Concluding Remarks 130

    CHAPTER 8: OPTlMIZED SOFTWARE FOR GENERAL

    PURPOSE USE 131

    8.1 Subroutine Structure 131

    8.2 User Interface of the GPR Routine 138

    8.3 User-specified Optional Parameters 142

    8.4 Precision of the Calculation 150

    8.5 Error Indicators 150

    8.6 Accuracy of the Solution 150

    8.7 Efficiency and Reliability 151

    8.8 A Numerical Example 151

    8.9 Concluding Remarks 155

    v

  • Table of Content3

    CHAPTER 9 : OTHER APPLICATIONS OF THE GPR ROUTINE 157

    9.1 Problems and Computational Performance 157

    9.2 Concluding Remarks 162

    CHAPTER 10 : FINAL CONCLUSIONS

    10.1 Summary and Comments

    10.2 Suggestions for Further Research

    APPENDIX A : SUMMARY OF FAILURES

    APPENDIX B : QUICK GUIDANCE

    APPENDIX C : COMPLETE RESULTS

    APPENDIX D : PROGRAM LISTINGS

    APPENDIX E : OTHER DEPENDENT SUBROUTINES

    REFERENCES

    VI

    164

    164

    166

    167

    170

    174

    282

    351

    358

  • ACKNOWLEDGEMENTS

    I am greatly indebted to my supervisor Professor C. Storey of Loughborough

    University of Technology for his guidance and help throughout this work. I

    would like to acknowledge especially his enormous effort in correcting my written

    l;:nglish. I would also like to thank him for introducing me to the interesting field

    of Optimization. I take this opportunity to express my gratitude to him for all

    the advice and encouragement I have received from him. I also acknowledge the

    productive interactions that I had with Professor Evans of Computer Studies. I

    wish to thank Louise and Helen for helping me with the typesetting.

    I am very grateful to my director of research Dr. A. C. Pugh for the enormous

    support he gave me throughout my candidature. I sincerely acknowledge all the

    assistance I obtained from Mr. R. Tallet and Dr. M.A. Rahin.

    I express my deep gratitude to my parents for their patience, to my brothers

    for their understanding and to my in-laws for their rendering valuable support. I

    specially record my dept of gratefulness to my father Professor A.F.M. Khodadad

    Khan, whose constant advice and encouragement has always been a source of

    inspiration for me. My gratitude is also due to my wife Ellora for her constant

    inspiration and mental support to achieve my goal.

    I would also like to express my gratitude to the Department of Mathematical

    Sciences of Loughborough U Illversity of Technology for supporting me throughout

    my candidature. I sincerely thank the Pilkington Library and the Computer

    Certre of Loughborough University of Technology for generously letting me use

    their facilities. Finally, I would like to gratefully acknowledge the Commonwealth

    Scholarship Conunission and the British Council for awarding me a scholarship,

    during the tenure of which, this research was carried out.

    vu

  • SUMMARY OF THE THESIS

    TITLE

    A Theoretical and Computational Investigation of a Generalized Polak-

    Ribiere Algorithm for Unconstrained Optimization.

    ABSTRACT

    In this thesis, a new conjugate gradient type method for unconstrained

    minimization is proposed and its theoretical and computational properties investi-

    gated. This generalized Polak-Ribiere method is based on the study of the effects

    of inexact line searches on conjugate gradient methods. It uses search directions

    which are parallel to the Newton direction of the restriction of the objective

    function on a two dimensional subspace spanned by the current gradient and a

    suitably chosen direction in the span of the previous search direction and the

    current gradient. It is shown that the GPR method (as it is called) has excellent

    convergence properties under very simple conditions. An algorithm for the new

    method is formulated and various implementations of this algorithm are tested.

    The results show that the GPR algorithm is very efficient in terms of number

    of iterations as well as computational labour and has modest computer storage

    requirements.

    The thesis also explores extensions of the GPR algorithm by considering

    multi-term restarting procedures. Further generalization of the GPR method

    based on (m + 1)-dimensional Newton methods is also studied.

    Optimized software for the implementation of the GPR algorithm is de-

    veloped for general purpose use. By considering standard test problems, the

    V11l

  • Summary of the TheJiJ

    superiority of the proposed software over some readily available library software

    and over the straight-forward Polak-Ribiere algorithm is shown. Software and

    user interfaces together with a simple numerical example and some more practical

    examples are described for the guidance of the user.

    IX

  • CHAPTER 1 INTRODUCTION

    This Thesis is an attempt to add to the theory of nonlinear optimization

    which, of late, has emerged as a useful branch of applied mathematics. In the introductory chapter, we discuss briefly the nature of optimization with special

    emphasis on the solution of unconstrained problems and give an outline of our

    work.

    1.1 General Nature of Optimization

    Optimization is concerned with getting the best from a gIven situation

    by analysing a set of alternative decisions. This is achieved by selecting a

    performance index for the situation under assessment, expressing it in terms of

    certain decision variables and then obtaining its best possible value by systematic

    adjustment of the variables. The choice of the performance index differs from

    situation to situation but generally involves some economic considerations, e.g.,

    maximum return on investment, minimum cost per unit yield, etc.. It may

    also involve some technical considerations such as minimum time of production,

    maximum efficiency of machines and so on.

    Optimization problems arise in a variety of practical situations. The way

    In which the performance index is obtained from the variables of a problem

    also varies widely from one situation to another. In some cases, it can only be qualitatively described, whereas mathematical models of many other problems can

    be formulated in which the performance indices are described by some suitably

    defined objective functions. In the latter case, the problem then reduces to a mathematical programming problem for finding the minimum or maximum value

    of the objective function.

  • Chapter 1: Introduction 2

    Mathematical modeling of optimization in many real-life situations leads

    to constrained problems in which the variables are restricted in some way -

    sometimes by having simple upper and lower bounds and sometimes by complex

    functional constraints. In fact, many complex problems such as, for instance, the

    production policy of a big company and the management of a large network are

    best treated by decomposing them into separate subproblems - each subproblem

    having constraints which are imposed to restrict its scope. On the other hand,

    many constrained problems can be converted to unconstrained ones in which the

    variables are free to assume all possible values, either by broadening the scope

    of the problem or by eliminating some variables using the constraints. Moreover,

    the unconstrained problems represent a significant class of practical problems.

    Optimization problems have attracted the attention of researchers for a long

    time. The earlier problems investigated were geometrical in nature. Later on,

    with the development of calculus, a formal theory of optimization grew up. This

    classical theory, though rich in theoretical content, is not of much practical value

    in numerical computation, especially in dealing with large-scale problems.

    Since the advent of electronic computers in the nineteen forties, there has

    been a rapid development of theory and practice of optimization. There is now a

    massive literature on the subject and vigorous research is still in progress creating

    new theory and testing various algorithms. Recent advances in the power and

    storage capacities of digital computers have made it possible to deal with large-

    scale optimization problems efficiently.

    1.2 Unconstrained Optimization

    A static unconstrained optimization problem is concerned with finding a local

    minimum or maximum of a prescribed real-valued function f : Rn -+ R of n real variables without any constraint on the variables. Without loss of generality, one

    may restrict consideration to minimization problems only, because maximization

    can be dealt with by minimization of - f(x,,"', x n ).

    Numerous methods have been devised for solving general minimization

    problems, the choice and suitability of any particular method being dependent

    on the nature and size of the problem. These methods are, in general, iterative

    in nature and give procedures for obtaining a sequence of approximate solutions

  • Chapter 1: Introduction 3

    converging to the actual solution. In practice, such methods start at an initial

    estimate of the minimizer and then proceed, according to some fixed rule, to

    better and better approximations, terminating at the actual minimizer or at an

    acceptable (according to pre-set standards) approximation of the minimizer after

    a finite number of iterations. For surveys of some of these techniques, we refer to

    Dennis and Schnabel[Dl]' Gill, Murray and Wright[Gl], Wolfe[Wl), Walsh[W4]'

    Zoutendijk[Zl).

    There are some methods in which the generation of the minimizing sequence

    is based simply on comparison of values of the objective function and no use of

    derivatives is made. These so-called direct search methods were once thought to be

    useful in dealing with problems in which the objective function is not differentiable

    or its partial derivatives are hard to evaluate. They are, however, very crude and

    generally prove to be less efficient than methods making use of derivative values

    no matter how these have to be evaluated.

    Problems involving smooth objective functions are best dealt with by

    gradient methods. In such methods the minimizing sequence is generated by

    determining at each step a direction of search and then locating the best possible

    estimate of the minimum point in the line of that direction through an appropriate

    choice of the steplength. The search direction at each step, constructed using the

    gradient values and sometimes the Hessian values also, is required to be such

    that function values initially decrease in that direction. The primary differences

    between various gradient methods rest with the way in which the successive

    search directions are constructed. Once this is done, all such algorithms call for

    choosing the minimum point on the corresponding line (exact line search), though,

    in practice, one is satisfied if the steplength satifies some accepted minimizing

    criterion (inexact line search).

    The development of efficient algorithms for solving unconstrained optimiza-

    tion problems is still an important area of research. This importance is derived

    not only from the desire to solve unconstrained problems, but also from the use

    made of these algorithms in constrained optimization. Indeed, unconstrained

    optimization lies at the heart of the whole of nonlinear optimization.

    In the next chapter, we shall give a short account of some gradient methods

    of unconstrained minimization as an introduction to our work.

  • Chapter 1: Introduction 4

    1.3 Scope and Organization of The Thesis

    In this thesis, we are concerned with the static unconstrained optimization

    problem

    P : Minimize f(x), x ERn,

    where the objective function f : Rn - R is, in general, a nonlinear function and is at least twice continuously differentiable. Our study begins with a short review of

    some basic results and solution techniques in Chapter 2. Then in Chapter 3, we

    develop a new conjugate-gradient type algorithm which is a generalization of the

    Polak-Ribiere algorithm and discuss its theoretical and algorithmic properties.

    This algorithm, referred to as the Generalized Polak-Ribiere (GPR, in short)

    Algorithm in the sequel, is extended and further examined in Chapter 4 and

    Chapter 5. An (m + I)-dimensional version of the GPR Algorithm is considered in Chapter 6 and various computational results are discussed in Chapter 7. The

    efficiency of the Algorithm and optimized software for its implementation (called

    the GPR Routine) are investigated in Chapter 8. The GPR Routine is applied to

    some practical problems in Chapter 9 and final conclusions are made in Chapter

    10.

  • CHAPTER 2 MATHEMATICAL FOUNDATIONS

    In this Chapter, we set out the notation to be used throughout the Thesis,

    discuss some basic results and give short accounts of some solution techniques.

    2.1 Notation

    In this study, the Euclidean n-space will be denoted by Rn with Ri = R, the

    real line. The points x in Rn will be considered as column vectors:

    (2.1.1 )

    the corresponding row vector being

    xT = (x,,, x n ) = (x,):: (2.1.2)

    The subscript i, always ranging from 1 to n (unless otherwise specified), will be

    reserved to indicate vector components, whereas, the superscript (k) will be used to distinguish vectors as X(i), x(2), .... We shall write xT z and IIxll to indicate

    the Euclidean inner product and norm respectively:

    (2.1.3)

    (2.1.4)

  • Chapter ~: Mathematical Foundation.5 6

    B(x, e) will denote the e-ball about x in Rn:

    B(x,e) = {z E Rn: IIz -xII < e}. (2.1.5)

    The elements of a matrix will be indicated by double subscripts, the first

    index indicating the row and the second index the column. For an n X n matrix

    A, IIAII will denote the induced Euclidean norm.

    Our notation for the objective function will always be f() in the general

    case and q(.) in the quadratic case. The gradient vector and the Hessian matrix

    of the objective function will be denoted by g(.) and GO respectively. Thus, in the general case with f : Rn -+ R,

    (2.1.6)

    In an iterative process for finding the minimum of f( x), we shall denote the starting point by x(J) and the subsequent iterates by X(2), x(3), etc., and write

    (2.1.7)

    The search direction at the k-th step will be denoted by s(k) and the steplength

    in this direction by a(k), so that .

    (2.1.8)

    er will denote the class of r-times continuously differentiable functions f: Rn -+ R.

    (.) (2) (.) (2) ( V 11 V ) WIll denote the angle between the two vectors v and V

    kEn. ,n2 will be used to mean that the integral variable k may assume

    values n. through n 2

  • Chapter J!: Mathematical Foundations 7

    (1) (2) . (1) (2). (1) (2). For x ,x E Rn wIth x # x ,the line-segment from x to x will be

    (1) (2) (1) (2) denoted by [x ,x ] when end pomts are mcluded and by (x ,x ) when end

    points are excluded.

    ~ (as above) will be used to indicate a definition and will mean the end

    of a proof.

    For convenience of reference, we shall number some statements (equations).

    This will be done serially in a section, and will be referred to (a.b.c), where a is

    the chapter number, b is the section number and c is the statement number. The

    introductory portion of a chapter is numbered section O.

    The lemmas, propositions and theorems will be numbered serially in a

    chapter as a.b, where a is the chapter number and b = 1,2, etc.

    The tables and figures will also be numbered serially in a section as (a.b.c),

    where a and b are the chapter and section number respectively and c = 1,2, etc.

    2.2 Background Material

    We shall freely use various notions and results from analysis, linear algebra

    and optimization theory in our work. All the relevant material used can be

    found in standard texts in analysis, linear algebra and optimization (for example,

    the text by Dennis and Schnabel[D1] has introductory sections dealing with this

    background material).

    2.3 Gradient Methods of Minimization

    As remarked in Section 1.2, a gradient method for minimizing a smooth

    nonlinear function I( x) under no constraints calls for generating a search direction s(k) at each iteration and a steplength (/k) in that direction so as to determine

    the next point

    (2.3.1 )

    satisfying the descent criterion

    (2.3.2)

  • Chapter 2: Mathematical Foundation" 8

    The process stops at x(m) if g(m) = 0 or yields a sequence {x(k)} of points

    converging to an approximation to a local minimum x which satisfies some

    convergence criterion.

    One of the oldest methods is the method of steepest descent, first introduced

    by Cauchy[Cl]. In this method the directions of search are taken as

    (2.3.3)

    This choice is motivated by the fact that, local to the current approximation, the

    negative gradient direction is the direction along which the function decreases

    most rapidly. The steepest descent algorithm, though simple and stable (that

    is, reduces the function value at each step), has the disadvantage of linear

    convergence which may, at times, be extremely slow, and so it is not suitable

    for practical use.

    Another basic minimization technique is the Newton method, based on the

    classical Newton method for solving nonlinear equations (Fletcher[Fl], Dennis

    and Schnabel[Dl]' Gill, Murray and Wright[Gl]). In this method, the directions

    of search are calculated from

    (2.3.4)

    or equivalently from the linear system

    (2.3.5)

    The idea behind this method is that a function may be locally approximated

    by a quadratic whose minimum can be reached in one step by the above choice

    of direction. The Newton algorithm for a general function is not necessarily

    convergent, but for C' functions with positive definite Hessian at x, convergence is quadratic under mild restrictions on f (Fletcher[Fl]' WoJfe[Wl]) if x(k) is near enough to x for some k. This rapid convergence property makes the method

    extremely efficient in many cases. However, the method has the disadvantages

    that it involves a large amount of computation at each step, in the way of

    calculating and inverting the Hessian, or solving a system of linear equations

    and it requires quite a large amount of storage in its implementation.

  • Chapter : Mathematical Foundation3 9

    In a bid to eliminate some of the computational disadvantages of the Newton

    method, the so-called quasi-Newton (abbreviated as QN) methods have been

    developed. These methods, first introduced by Davidon[D4] and later clarified by Fletcher and Powell[F4], have the general feature that the search directions are

    given by (2.3.6)

    where H(k) is an approximation to G(k)-' (or G(k) itself) with H(l) symmetric,

    positive-definite (usually, H(I) = In, the n x n identity matrix) and the so-called

    quasi-Newton conditions H(k+I)y(k) = d(k)

    hold. Besides the Davidon-Fletcher-Powell (DFP) updating formula

    H(k)y(k)y(k)TH(k)

    y(k)TH(k)y(k)

    (2.3.7)

    (2.3.8)

    there are now a variety of QN procedures differing in the ways in which the

    matrices H(k) are updated (Fletcher[F1], Dennis and Schnabel[D1]' Dennis and More[D5]). A well-known group of updating matrices is Broyden's 0-

    family(Broyden[B4]):

    where

    (k+I) _ (k) _ u(k)u(k)T d(k)d(k)T

    H - H v(k) + '1(k)

    + O(k) (u(k) _ (~:::) d(k) (u(k) - (~:::) P) ~ (2.3.9a)

    '1(k) = d(k)T y(kl,

    u(k) = H(k)y(k),

    v(k) = u(k)T y(k)

    (2.3.9b)

    and O(k) is a free parameter. The DFP formula is a. particular member of this

    class (O(k) = 0). Another particular member (O(k) = -;t.r) is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula (Broyden[B4]' Fletcher[F5]' Gold-

    farb[G5], Shanno[S7]):

    (2.3.10)

  • Chapter 2: Mathematical Foundation" 10

    which is still considered to be the most effective of the QN methods (Shanno and

    Phua[S5]).

    The QN methods have a serious disadvantage, in the case of large scale

    problems, and that is the need to store matrices in their implementation. At-

    tempts to avoid this difficulty have stimulated research in the area of conjugate

    gradient (abbreviated as CG) methods which call only for vectors in their

    implementation. Originally proposed by Hestenes and Stiefel[Hl] to solve systems

    of linear equations, the CG method was first applied to minimization problems by

    Fletcher and Reeves[F2]. The underlying idea is that the minimum of a quadratic

    function

    (2.3.11)

    where A is symmetric and positive-definite, is obtained in at most n steps through

    exact line search along each of n mutually A -conjugate directions. In this case,

    the CG search directions are chosen as

    satisfying the de3cent condition

    for k = 1, for k> 1

    at each step, with the (3(k) chosen so that the conjugacy condition3

    S(i)TAs(j) = 0, . J. E 1 n ; -i. J. t, ". T

    are satisfied.

    (2.3.12)

    (2.3.13)

    (2.3.14)

    Several formulae for (3(k) have been obtained. Of these, the FR formula

    (Fletcher and Reeves[F2])

    (2.3.15)

    the PR formula (Polak and Ribiere[Pl])

    (2.3.16)

    ............. -----------------------

  • Chapter : Mathematical Foundatiom

    and the HS formula (Hestenes and Stiefel[Hl], Sorenson[Sl])

    (k) _ g(k)T(g(k) _ g(k-I)

    fJHS - s(k-J)T(gm _ g(k I))

    11

    (2.3.17)

    are often used. These different formulae for fJ are completely equivalent on quadratics when exact line searches are used. They can also be used on general

    nonlinear functions J(.), but then their computational behaviour and efficiency differs considerably from one formula to another.

    Theoretical and computational properties of different CG methods have been

    investigated by many authors (Beale[Bl], Crowder and Wolfe[C3], Powell[P3],

    Baptist and Stoer[B6], Stoer[S6], Cohen[C2], Fletcher[F3], Shanno[S2], Hu and

    Storey[H3], Wolfe[W2,W3]). Though the FR method has nice global convergence

    under very mild conditions (Zoutendijk[Z2], Powell[P5], Al-Baali[Al]), no such

    satisfactory global convergence results are available for the PR and HS methods

    (Gilbert and Nocedal[G3]). It has also been observed (Powell [P4]) that the PR

    method is unlikely to have global convergence without some restrictive conditions.

    On the other hand, the numerical performance of the PR method has been found

    to be superior to that of the FR method in most cases. Recently, some quite

    efficient hybrid CG methods have been proposed (Touati-Ahmed and Storey[Tl],

    Gilbert and Nocedal[G3]). Attempts to improve upon the performance of the

    CG methods have also led to some generalizations. These include Beale's and

    Nazareth's three term recurrence methods (Beale[Bl]' Dixon, Ducksbury and

    Singh[D2], Nazareth[N4,N3]' Dixon[D3]) and the generalized CG method of Liu

    and Storey[L2]. This latter method (abbreviated as the LS method) is in fact

    a two-dimensional Newton method in the sense that it uses as the next search

    direction s(k) the Newton direction of the restriction of f on span{g(k), s(k-I)}, with g(k) the current gradient and s(k-I) the previous search direction. Thus the

    LS algorithm uses the search direction

    (2.3.18a)

    with

    g(k)T C(k)s(k-I) )-' ( g(k)Tg(k) )

    s~k-I)T G(k) s(k-I) g(k)TS(k-I)

    (2.3.18b)

  • Chapter 2: Mathematical Foundation.! 12

    Both the QN and the CG methods have their advantages and disadvantages.

    There have been several attempts to combine the two methods so as to obtain

    algorithms with the good convergence properties of the QN methods and low stor-

    age requirements of the CG methods. Work along these lines include Perry[P6],

    Shanno[S3,S4], Buckley[B5], Shanno and Phua[S8], Nazareth[Nl], Nocedal[N2],

    Buckley and LeNir[B2,B3]' Liu and Nocedal[Ll] and Gill and Murray[G4]. As a

    result, some variable storage CG methods or limited memory QN methods have

    been developed having good trade-off between memory and efficiency.

    2.4 Line Search Strategies

    Any descent method of function minimization involves a one-dimensional

    line search at each iteration for locating the next acceptable approximation to

    the minimizer. Thus, at the current point X(k), if g(k) ccF 0, we choose a descent

    direction s(k) satisfying (2.3.13) and then determine an admissible steplength

    a(k) > such that the descent criterion (2.3.2) is satisfied at the next point x(k+I) defined by (2.3.1). The descent condition (2.3.13) ensures that for all sufficiently

    small a > 0, f(x(k) + as(k)) < f(x(k)), and hence one can always choose a(k) > such that (2.3.2) holds. In practice any a E (0, a~k)), where

    (2.4.1)

    is accepted as a(k), subject to certain conditions to ensure a sufficient decrease

    f(k) - f(k+l) in the function value. Notice that a(k) is the least positive number

    for which f(x(k)+a~k)s(k)) = f(x(k)) if such a number exists; otherwise a~k) = 00:

    In exact line search at x(k), the steplength a(k) is taken to be the value of a

    that minimizes the function

    (2.4.2)

    in (0, a~k)), provided such a minimizer exists. Thus, according to exact line search,

    (2.4.3)

    Assuming the existence of stationary points of .p(k)(.) in (0, a~k)), we then have

    the exact line search condition

    (2.4.4)

  • Chapter ~: Mathematical FoundationJ 13

    The determination of a(k) by exact line search involves the minimization of

    the nonlinear function tj>(k), or solving the nonlinear equation tj>(k)' (a) = 0, which

    is usually, expensive to carry out. Moreover, tj>(k) may not have a minimizer

    or a stationary point in (0,00). Therefore, exact line search has only theoretical importance and in practice, alternative inexact line search strategies are preferred.

    Indeed, many efficient line search techniques have been proposed and tested.

    These are, in fact, based on a "one dimensional" minimization using a combination

    of interval reduction and quadratic or cubic interpolation techniques depending

    on the availability of gradient information. For a discussion of such inexact line

    search methods, we refer to Fletcher[F1], Dennis and Schnabel[D1]' Gill, Murray

    and Wright[G1], Wolfe[W1].

    In choosing a steplength a(k) at a current point x(k), we need to stay away

    from the end points of the interval (0, a~k) in order to produce a significant

    decrease in the function value. The Goldstein requirement (Goldstein[G6])

    (2.4.5)

    with 0 < C, < ~ ensures that a(k) is not too close to a~k) by restricting the average rate of decrease of I(x) in moving from x(k) to x(k+l) along s(k) to be at

    least some prescribed fraction of the initial rate of decrease in that direction (see

    Figure 2.4.1 below). On the other hand, the Wolfe condition (Wolfe[W2,W3])

    (2.4.6)

    with 0 < c, < 1 ensures that a(k) is not too small by requiring the rate of decrease 'of I at x(k+l) in the direction s(k) to be larger than some prescribed fraction of the

    initial rate of decrease (see Figure 2.4.1 below). The restriction 0 < c, < c, < 1 guarantees that (2.4.5) and (2.4.6) can be satisfied by om E (0, a~k) (Wolfe[W2], Powell[P8] ).

    In recent studies, the strong Wolfe condition

    (2.4.7)

    together with the Goldstein condition (2.4.5) subject to 0 < c, < c, < 1 are often preferred as line search requirements (Fletcher[Fl], AI-Baali[A 1], AI-

    Baali and Fletcher[A2]' Liu and Storey[L2]). We call the combination of these

  • Chapter 2: Mathematical Foundation.! 14

    two conditions the Wolfe-Powell Condition.!. Conditions (2.4.5) and (2.4.7) are

    sometimes referred to as strong Wolfe conditions (Gilbert and Nocedal[G3]).

    L-~--------------~----~r-----~a I-< Permissible under (2.4.5) >1 a *

    I-< Permissible under (2.4.6) >l ~ Permissible under both >1

    Figure 2.4.1. Permissible range for (l(k) under conditions (2.4.5) and (2.4.6).

    It is remarked that the value of c, determines the accuracy with which (/k)

    approximates a stationary point of f along s(k), and consequently provides a means of controlling the balance of effort to be expended in computing a(k). In

    general, the smaller the value of c" the more accurate the line search is. Obviously,

    if c, = 0, the line search is exact.

  • CHAPTER 3 GENERALIZED POLAK-RIBIERE ALGORITHM

    In this Chapter, we develop a new type of conjugate gradient algorithm for

    finding a local solution to the problem,

    [P] Minimize I(x), x E Rn

    and discuss various theoretical properties of the algorithm. The search directions

    in the algorithm, as we shall see, are generalizations of those in the Polak-Ribiere

    method, and so the algorithm is called the Generalized Polak-Ribiere Algorithm

    (GPR Algorithm, in short).

    In [P], the objective function I : Rn -+ R is, in general, nonlinear and it is assumed throughout the sequel (whether stated explicitly or not) that the

    folJowing conditions hold:

    [AP-l] I is twice continuously differentiable. [AP-2] 3 X(l) E Rn 3 the level set

    is bounded.

    Additional conditions will be added whenever necessary.

    It may be observed that

    (i) By [AP-l], the Hessian G(x) is symmetric for all x ERn.

    (3.0.1)

    (ii) By [AP-l],the level set L(x(!) in [AP-2] is closed and hence it is

    compact.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 16

    (iii) The objective function f('), the gradient g(.) and the Hessian G(), being continuous, are bounded on the compact set L(x(l with x(l) as in [AP-2J.

    Defining

    M b. sup{IIG(x)1I : x E L(X(I)}, (3.0.2)

    we have then

    (3.0.3)

    3.1 Derivation of the Algorithm

    We begin with an estimate x(l) of a local minimizer x of f and take the initial search direction as the steepest descent direction at x(l):

    (3.1.1)

    To determine the search direction s(k) for the k-th iteration (k > 1) from the current point x(k), we proceed as follows:

    Let F(x + as) denote the quadratic approximation to f(x + as), obtained by truncating the Taylor series expansion of f( x + as): .

    Assuming that G( x) is positive-definite, we can write

    where 9 b. g(x) and G b. G(x), and hence

    1 (gTs)2 min(F(x + as) - f(x = --2 T.G

    " s s

    1 =--V,

    2

    where V = V(x,s) is given by

    (3.1.2)

    (3.1.3)

    (3.1.4)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 17

    and the minimum occurs for

    (3.1.5)

    We now set

    s = -g + {3p, (3.1.6) where p is an arbitrary but fixed vector in Rn such that p and 9 are linearly independent and {3 is a nonzero real variable, and minimize (3.1.3) as a function

    of {3. This demands that we choose {3 such that

    v _ (gT(_g+{3p))2 ({3) - (_g + {3p)TG( -g + {3p)

    (gTg _ fJgTp)2 (3.1.7)

    is maximillll. Here the denominator is positive for all {3 in view of positive-

    definiteness of G.

    The value of {3 for which (3.1.7) is maximum must satisfy the equation

    (3.1.8)

    obtained by setting d~

  • Chapter 9: Generalized Polak-Ribiere Algorithm 18

    provided the denominator in fJ2 is nonzero. The search direction

    corresponding to (3.1.10) is not a descent direction as gT 81 = 0 and is of no importance to us (in fact fh makes V take its minimum value 0). The search direction

    82 = -g + fJ2P corresponding to (3.1.11) forms the basis of the proposed algorithm. The L5

    algorithm, studied in Liu and 5torey[L2} and Hu and 5torey[H2), is also based on 82. Notice that (3.1.11) reduces to (3.1.9) for gTp = O.

    We now let

    (3.1.12)

    in span{s-,g}, where s- is the search direction in the previous iteration, and

    'Y f' 0 is determined so that T

    9 P = o. (3.1.13)

    This requires

    (3.1.14)

    The current search direction is then defined by (3.1.6) withp described by (3.1.12)

    and (3.1.14) and (3 given by (3.1.9).

    IT we denote p by s, we then have the following iterative process for the GPR algorithm from the initial estimate x(l) for the minimizer x :

    ( k) { _g(l) for k = 1 s -' , - _g(k) + (3(k) s(k-I) for k > 1

    GPR' ,

    -(k-I) _ (k-I) _ (g(k)TS(k-I) (k) s - S (k)T (k) 9 ,

    9 9 for k > 1,

    (3.1.15a)

    (3.1.15b)

    (3.1.15e)

    (3.1.15d)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 19

    It may be remarked that the stopping condition will be activated whenever

    g(k) = 0 at any iteration and so we can assume that g(k) oF 0 as long as the iteration continues. Moreover, it follows from (3.1.15c) that

    (3.1.16)

    and hence, from (3.1.15b), we have

    (3.1.17)

    as long as g(k) oF O. This shows that:

    Proposition 3.1. In the GPR Algorithm, ik) is a descent direction from x(k).

    The steplength a(k) at each iteration is determined by a one-dimensional line

    search (see section 2.4) along s(k) so that

    For an exact line search

    a(k) = arg min f(x(k) + as(k), o

  • Chapter 9: Generalized Polak-Ribiere Algorithm 20

    at x(l) satisfying [AP-2). Besides the general problem [P), the quadratic case,

    namely,

    [Q) Minimize q(x), x ERn,

    where (3.2.1)

    will alilO be considered. In dealing with [Q], it will be assumed throughout the

    sequel that

    [AQ) The Hessian A is symmetric and positive-definite.

    It may be noted that for the quadratic function q(.),

    and hence

    In this case, the steplength

    g(x) = Ax + b, G(x) = A

    where y(k) is as in (2.1.7). In case of exact line search, (3.2.4) becomes

    (k) _ g(k)T g(k)

    a - s(k)T As(k)

    (3.2.2)

    (3.2.3)

    (3.2.4)

    (3.2.5)

    by the exact line search condition (3.1.20) and the descent condition (3.1.17).

    In what follows, unless explicitly referred to [Q], we shall consider that the

    GPR algorithm is applied to [Plo

    The GPR algorithm has the property that s(k) is conjugate to s(k-I) for all

    k > 1. Indeed, we have,

    Proposition 3.2. The GPR algorithm satisnes

    (3.2.6)

    for all k > 1.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 21

    Proof. This follows directly from (3.1.15b) and (3.1.15d). I

    By applying the mean value theorem to g(.), we obtain, according to the GPR algorithm,

    where

    y(k) = g(k+1 ) _ g. l~(x(k) + taCk) P)dt = G(~(k)

    for some ~(k) E (x(k), x(k+l).

    (3.2.7)

    (3.2.8)

    (3.2.9)

    Now, if IId(k)1I = IIX(k+I) - x(k)1I is sufficiently small, then since G() IS continuous, we can approximate e(k) by G(k), and thus obtain

    So, in this case, if exact line searches are carried out (in which case S

  • Chapter 9: Generalized PolakRibiere Algorithm 22

    Proposition 3.3. lithe GPR algorithm is applied to IQ] and exact line searches are used throughout, then

    for all k > 1 and j E 1, k-1.

    S(k)T As(j) = 0,

    g(k)T g(j) = 0

    (3.2.11)

    (3.2.12)

    Proposition 3.4. lithe GPR algorithm is applied to IQ] and an exact line search is carried out at each iteration, then

    (3.2.13)

    for k > 1 and j E 1, k - 1.

    Proposition 3.5. lithe GPR algorithm is applied to IQ] with exact line searches, then the algorithm terminates at a stationruy point x(m+I) after m ~ n iterations,

    where m is the number of distinct eigenvalues of A.

    Notice, however, that convergence is not, in general, obtained in a finite

    number of steps if the objective function is not quadratic, and the number of

    iterations required to attain a given accuracy depends upon the initial estimate x(l) of the minimizer x.

    We now consider some relations between the magnitudes of different quanti

    ties occuring in the GPR algorithm applied to the general problem IP]' which we shall use in the subsequent analysis.

    Proposition 3.6. In the GPR algorithm,

    (a) IIs(k) 112 = IIg(k) 112 + (.B~~Slls(k-l)1I2, k > 1 (b) IIg(k)1I ~ IIP)II, k ~ 1 (c) IIP)II~lIs(k)lI, k~I

    Proof. From (3.1.I5b), we get, Vk > 1,

    liP) 112 = (- g(k) + .B~~R s(k-l)f (_ g(k) + .B~~R P-l)

    = Ilg(k) 112 + (.B~~R)2I1s(k-l) 112,

    (3.2.I4a)

    (3.2.I4b)

    (3.2.I4c)

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    since g(k)T S

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    Proposition 3.8. For all k > 1, (k) M IIg(k) 11

    l.BoPRI ~ m IIs(k 1)11'

    Proof. From (3.1.15d), (3.0.2) and (3.2.18), we have, V k > 1, (k) IIg(k) IIIIG(k) 1111:5(.1;-1) 11

    I.BOPRI ~ m IIs(k-1) 112

    M IIg(.I;) 11 ~ m IIs(k 1) 11 I

    Proposition 3.9. For all k > 1,

    IIP)II ~ (1+ ~)lIg(k)lI. Proof. From (3.1.15b), we get, V k > 1,

    liP) 11 ~ Ilg(k) 11 + l.Bi~R 11I:5(k-1) 11

    ~ IIg(k) 11 + M IIg(k) 11 m

    using (3.2.19). I

    Proposition 3.10. There exists r > 0 such tbat

    cos (}(k) ~ r

    for all k, where (}(k) 6 (_ g(k)A s(k).

    24

    (3.2.19)

    (3.2.20)

    (3.2.21)

    -1

    Proof. This follows from (3.2.15) and (3.2.20) with r = (1 + ~) > O. I

    When the GPR algorithm is implemented with exact line search at each

    step, we have, from (3.2.7) using the exact line search condition (3.1.20) an

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    Proposition 3.11. For all k,

    1 < Q(k) < ..!... M(1+ ~/ - - m

    Proof. From (3.2.9) and (3.2.18), it follows that

    m IIs(k) 112 ~ s(k)T a(k) s(k) ~ M IIs(k) 112

    for all k and hence, by (3.2.22),

    IIg(k) 112 M IIs(k)1I2

    Since, by (3.2.14b) and (3.2.20),

    1 IIg(k) 11 (1 + ~) ~ IIs(k)1I ~ 1,

    we have (3.2.23a).

    Proposition 3.12. For all k,

    IIg(Hl) 11 ~ (1 + ~) Ils(k) 11. Proof. From (3.2.7) and (3.2.9), we have,

    g(k+l) = g(k) + Q(k)G(OP)

    25

    (3.2.23a)

    (3.2.23b)

    (3.2.23c)

    (3.2.24)

    for some ~ E (x(k), x(k+ 1) C L(x(!). Hence using (3.2.14b), (3.2.23a) and

    (3.0.2),we conclude that

    IIg(k+l)II ~ (1 + ~)lIs(k)lI . Proposition 3.13. For all k > 1,

    IP(k) I ~ (1 + M) M. GPR m m

    Proof. In view of exact line search, we have,

    s(k-l) = s(k-l)

    (3.2.25)

    for all k > 1 and hence (3.2.25) is obtained from (3.2.19) using (3.2.24).

    Proposition 3.14. For all k > 1,

    Ils(k)1I ~ (1+ ~r"P-l)l. (3.2.26) Proof. This follows from (3.2.20) and (3.2.24).

  • Chapter 9: Generalized Polak-Ribiere Algorithm 26

    3.3 Global Convergence Properties of the Algorithm

    In this section, we discuss global convergence properties of the GPR algo-

    rithm applied to [P) under standard line search strategies (as discussed in section 2.4). Throughout the section, it is assumed that the conditions [AP-l) and [AP-2)

    hold and the GPR algorithm is initiated at X(I) satisfying [AP-2).

    We first observe that in view of Proposition 3.1, the GPR algorithm with

    exact line search satisfying conditions (3.1.19) and (3.1.20) or with inexact line

    search satisfying Wolfe-Powell conditions

    [W-l) !(HI)::; !(k) + Cl a(k)g(k)Tp),

    [W-2) Ig(k+I)TS(k)l::; - c2

    g(k)TS(k),

    where 0 < Cl < c2 < 1, leads to the inequality

    (3.3.1)

    (3.3.2)

    (3.3.3)

    with some p > O. This is established in Proposition 3.15 and Proposition 3.16. The proofs depend on the descent property (3.1.17) and are valid for any descent

    algorithm.

    Proposition 3.15. If an exact line search is performed at each iteration with

    the GPR algorithm, then the inequality (3.3.3) with some p > 0 holds for all k.

    Proof. By the exact line search condition (3.1.19), we have,

    (3.3.4a)

    where a~k) is as defined in (2.4.1).

    But, for 0 < a < a(k), we have, by the Taylor formula,

    !( x(k) + as(k) = !(k) + ag(k)Ts(k) + !a2 s(k)T G(x(k) + /laP)s(k)

    for some /I E (0,1). Since the segment [x(k), x(k) + as(k) C L(x(l)), 50, using (3.0.3), we have,

    (3.3.4b)

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    for 0 < a < a(k). The quadratic polynomial

    p(k)(a) ~ I(k) + ag(k)Tp) + ~a2 M IIs(k) 112

    attains its minimum value

    at

    (k) (k) (g(k)TS(k2

    Pm;. = I - 2M IIs(k) 112

    -(k) _ g(k)T s(k)

    a - - M IIs(k) 112

    27

    (3.3.4c)

    which is positive by (3.1.17). Since p(k)( a) is decreasing on (0, o(k and increasing on (a(k) 00) it follows that a(k) < a(k) and hence , , .

    ( (k)T (k2

    J

  • Chapter 9: Generalized Polak-Ribiere Algorithm 28

    We next show that acceptable steplengths exist in line searches using the

    Wolfe-Powell conditions [W-l] and [W-2] for c" c, satisfying 0 < c, < ~ and C, < c, < 1. The proofs, though standard for any descent algorithm, are included for the sake of completeness.

    Lemma 3.17a. For any c, E (0, ~), steplengths ark) > 0 can be determined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-l].

    Proof. For a > 0 such that [x(k), x(k) + as(k)] C L(x(l), we have, as in the case of (3.3.4b),

    Notice that for any c, E (O,~) and any a > 0, (3.1.17) implies

    f(k) + ag(k)Ts(k) + ~a2 M liP) 112 :::; f(k) + c, ag 2(1 - c, )lIg(k) 112 M Ils(k) 112

    (3.3.6a)

    (3.3.6b)

    Since f(x(k) + as(k) initially decreases along s(k), so either there exists a least positive a~k) such that

    or else,

    for all a > O.

    In the first case, we notice, from (3.3.6a), that

    (3.3.6c)

    which is greater than a:(k) for 0 < c, < ~. So, in either case, any positive a(k) :::; a:(k) will satisfy (3.3.1). I

    Lemma 3.17b. For any c, E (0,1), steplengtbs a(k) > 0 can be detennined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-2J.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 29

    Proof. The proof is by contradiction.

    Suppose that for some c2 E (0,1) and all a > 0,

    Ig(x(k) + as(kTs(k)1 ;, - c2

    g(k)TS(k), (3.3.7a)

    that is,

    (3.3. 7b)

    or

    (3.3.7c)

    We notice that the function

    has derivative

    (3.3.7d)

    IT (3.3.7b) holds, then (3.3.7d), (3.3.7b) and (3.1.17) imply that 4>'(t) > 0 for all t ~ 0 and hence 4>(0) < 4>(a), that is, J(x(k < J(x(k) + as(k for all a > O. This contradicts the fact that s(k) is a descent direction. So (3.3. 7b) cannot hold.

    IT (3.3.7c) holds, then it follows from (3.3.7d), (3.3.7c) and (3.1.17) that

    4>(a) - 4>(0) = l4>'(t)dt < -c2 a llg(k)1I 2 ,

    that is,

    (3.3.7e)

    for all a > O. It then follows that x(k) + as(k) E L(x(1 for all a > O. But the continuous function J is bounded on the compact set L(x(1. Hence, 3N > 0:;1 "la > 0, IJ(x(k) + as(k1 ~ N. However,

    N + J(k) (k) (k) a > c

    2I1g(k)1I 2 ~ J(x + as ) < - N.

    This contradiction therefore shows that (3.3.7c) cannot hold, so the Proposition

    is proved. I

    Proposition 3.18. For any C"C2 satisfying 0 < Cl < ~ and Cl < C2 < 1, there exists an interval of acceptable steplengths a(k) > 0 in a line search at x(k) with the GPR algorithm satisfying [W-l] and [W-2].

  • Chapter 9: Generalized Polak-Ribiere Algorithm 30

    Proof. From Lemma 3.l7b and (3.3.5c), it follows that steplengths a(k) for

    which [W-2] holds satisfy

    (1- C,)lIg(k) 112 M IIS(k) 112

    (3.3.8a)

    On the other hand, we have seen in Le=a 3.l7a that [W-l] holds for any

    positive a(k) ~ a(k), where a(k) is as defined in (3.3.6b). But clearly,

    (3.3.8b)

    Hence, for any a(k) In the interval [g(k),a(k)], both [W-l] and [W-2] hold

    simultaneously. I

    We now look into the convergence properties of the GPR algorithm executed

    without regular restarts. In this connection, some additional conditions are

    needed for establishing the convergence criterion

    lim IIg(k) 11 = 0 k-oo

    (3.3.9a)

    or even the weaker criterion

    !im IIg(k) 11 = O. (3.3.9b) k_oo

    The next two theorems establish some general conditions for the convergence

    of the GPR algorithm.

    Theorem 3.19. Suppose that in the GPR algorithm, g(k) =1= 0 for all k and at

    each iteration a(k) is chosen so as to satisfy (3.3.3) for some p > O. Assume that, in addition to the conditions {AP-I} and {AP-2}, the following condition holds:

    00

    [AP-4] The series L cos2 (jCk) is divergent, where e(k) "" (- g(k)A s(k) . k=!

    Then the limit (3.3.9b) is achieved.

    Proof. Suppose that (3.3.9b) does not hold. Then 3f > 0 3

    (3.3.l0a)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 31

    for all k. It then follows from (3.2.16), (3.3.10a) and (3.3.3) that

    1 (g(k)TS(k)2

    - IIg(k)1I2 s(Ws(k)

    :::; _1 (I(k) _ I(HI)) pe2

    and hence ' 1,

    (3.3.10c)

    But the continuous function 10 is bounded on the compact set L(X(I). Letting

    r = inf{f(x) : x E L(x(i)}, (3.3.10d)

    we thus have

    (3.3.10e)

    whence, by the monotone convergence property of positive term series, it follows 00

    that L cos2 (I(k) is convergent. k=1

    This contradiction establishes the theorem. I

    Theorem 3.20. If in Theorem 3.19, the condition [AP-4} is replaced by the

    condition

    [AP-5] the sequence {cos (I(k)} is bounded away from 0,

    then the limit (3.3.9a) is achieved by the GPR algorithm.

    Proof. For each k, we have,

    k

    l(k+I) = 1(1) + L(I(HI) - l(j). j=1

    Hence, using (3.3.3), (3.1.17) and (3.2.15), we get,

    k

    I(HI) :::; 1(1) - p L IIg(j) 11 2cos2 (1(j), j=1

    (3.3.11a)

    (3.3.lIb)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 32

    where e 0 3

    (3.3.llc)

    for all j, and hence, k

    j 0 whenever g(k) =f O. Assume that conditions fAP-l}, fAP-2} and fAP-4} hold. Then either a finite sequence {x(k)}

    is obtained whose last term x(m) satisnes g(x(m) = 0 or else the sequence {x(k)}

    has a limit point x such that g(x) = O. If, instead of fAP-4}, the condition fAP-5} is assumed, then g(x) = 0 for all limit points of {x(k)}.

    Proof. The iteration stops whenever g(x(m) = 0 for some m.

    Suppose now that g(k) =f 0 for any k.

    Assume that [AP-4] holds. Then, by Theorem 3.19,

    lim IIg(k) 11 = 0, k-oo

  • Chapter 9: Generalized Polak-Ribiere Algorithm 33

    and since {x(k)} is a sequence in the compact set in L(x1, so, by standard results

    of analysis, there exists a subsequence {x(kj )} of {x(k)} such that

    and

    .lim g(x(kj = 0 ,-00

    lim x(kj) = x ;-00

    for some x E L(x(l. But then, by the continuity of g,

    and hence

    g(x) = 0

    for the particular limit point x of {x(k)}.

    (3.3.12a)

    (3.3.12b)

    On the other hand, if [AP-4] is replaced by [AP-5], then it follows from

    Theorem 3.20 that lim g( x(k = o.

    k-oo

    Hence, for any convergent subsequence {X(kj)} of {x(k)} with .lim x(kj) = x, we )-00

    have,

    It may be remarked that if the sequence {x(k)} has just one limit point x (that is, if {x(k)} is convergent), which is usually the case in practice, then it makes

    no difference whether we take [AP-4] or [AP-5]. In this case, f(x(k ! f(x) and g(x) = 0, and therefore x is a local minimizer of f or possibly a saddle point.

    We conclude this section with some comments about the conditions [AP-4]

    and [AP-5] used in the convergence proofs.

    The condition [AP-4] is much weaker than the condition [AP-5] and is

    the weakest condition that has been used to prove global convergence for CG

    algorithms (Fletcher[Fl]). Regarding [AP-5], we note that negligible reductions in

    function values can occur if the search directions S(k) are close to being orthogonal

    to the negative gradients, and the condition [AP-5] ensures that this does not

    happen. We have already seen that a sufficient condition for the realization of

    [AP-5] with the GPR algorithm is [AP-3]. Another set of conditions, adapted

    from Liu and Storey[L2], is considered below.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 34

    Proposition 3.22. In the GPR algorithm, under fAP-l} and [AP-2}, suppose

    that V k > 1, [AP-6aJ s(k-l)T G(k)s(k-l) > 0

    [AP-6bJ g(k)T G(k)g(k) > 0

    [AP-6dJ (g(k)TG (k)s(k-l)2 ~ (1- .;. )(g(k)TG(k)g(k)(s(k-l)TG(k)s(k-l)

    for some "f. ~ 1

    whenever g(k) i- O. Then if 00 1

    [AP-6eJ L = 00, k=l 1 + "f. r.

    then [AP-4} holds. On the other hand, if

    [AP-6fJ lim 1 > 0, k-oo 1 + "f. r.

    then [AP-5} holds.

    Proof. Set V k > 1,

    u. ~ g(k)TG(k)s(k-l),

    ~ g(k)T s(k-l)

    q. = g(k)Tg(k) .

    Then it follows from [AP-6aJ, [AP-6bJ and [AP-6dJ that

    u2 1 1--'->->0

    t. v. - "f.

    Moreover, from (3.l.I5c) and (3.1.I5d), we obtain, V k > 1,

    (3.3.I3a)

    (3.3.I3b)

    (3.3.I4a)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 35

    which gives, on simplification,

    Thus,

    (3.3.14b)

    (3.3.14c)

    using [AP-6c] and (3.3.13a).

    Now, from (3.1.15b), (3.1.15c) and (3.3.14a), we obtain, V k > 1,

    S(k)=_( t.-q.u. )g

  • Chapter 9: Generalized Polak-Ribiere Algorithm 36

    In this connection, it may be noted that if G(k) is positive-definite, then [AP-6a], [AP-6b] and [AP-6c] hold with r. ~ X(k), where X(k) is the spectral condition number of G(k), that is, the ratio )!k) /21 (k) of the largest to the smallest eigenvalues of G(k) _ It is possible to see that [AP-6c] and [AP-6d] are verified for

    r. = 'Y. = r ~ 1. Then the restrictions [AP-6e] and [AP-6f] are automatically

    satisfied.

    We further observe that the Zoutendijk condition (Zoutendijk[Z2])

    00

    Lcos2e(k)lIg(k)1I2 < 00 (3.3.18a) k=l

    is satisfied by the GPR algorithm under conditions [AP-1] and [AP-2] and the line

    search condition (3.3.3). This is so, because, from (3.3.3), (3.2.16) and (3.1.17),

    we have, J(k+1) :::; J(k) _ p cos2 e(k) IIg(k) 112

    for all k so that "IN > 1,

    N

    L cos2 e(k)IIg(k) 112 :::; ::'(1(1) - r), k=l p

    where r is as defined by (3.3.10d).

    (3.3.18b)

    (3.3.18c)

    We now analyse some weakened conditions for the convergence of the GPR

    algorithm. From (3.3.18a) and (3.2.15), we have

    00 IIg(k) 114 {; Ils(t) 112 < 00. (3.3.l8d)

    Hence if the limit (3.3.9b) is not achieved, then since

    (3.3.l8e)

    so, by the comparison test, 00 1

    {; IIs(k) 112 < 00 (3.3.19)

    which requires that IIs(k)lI-+ 00 ~ufficiently rapidly. Indeed, if IIsCk) 112 = O(k) as k -+ 00, then lIo(h ll 2 ~ ct for some c> 0 and hence

  • Chapter 9: Generalized Polak-Ribiere Algorithm 37

    and the failure of (3.3.19) implies that the limit (3.3.9b) is achieved. We discuss

    below some conditions which ensure this with the GPR algorithm.

    We continue to assume that conditions [AP-l] and [AP-2] are satisfied and

    the GPR algorithm is initiated at X(I) as in [AP-2]. By continuity of g(.), we

    have, (3.3.20)

    for all k ~ 1.

    Proposition 3.23. For all k > I > 1,

    11 8(k) 11 < {) (118(1-1) 11) (1 + fJ(k)' + fJ(k)'fJ(k-l)' + ... - IIg(l-I) 11

    + fJ(k)' fJ(k-I)' . fJW ) ! . (3.3.21)

    Proof. We have, from (3.2.14a), (3.2.14c) and (3.3.20),

    liP) 112 ~ {)2 + fJ(k)'lIs(k-l) 112 (3.3.22a)

    for all k > 1.

    Consider any I > 1.

    From (3.3.22a), we obtain, using (3.2.14b) and (3.3.20),

    Ils(l) 112 < {)2 (lIs(l-I) 112) + fJ(I)' (118(1-1)112) {)2

    - IIg(l-I)112 IIg(l-I)112

    ( IIs(l-I) 112 ) ,

    = {)2 IIg(l-1) 112 (1 + fJ(1) ) (3.3.22b)

    and further assuming (3.3.21) for some k ~ I,

    118(k+l) 112 < {)2 (118(1-1) 112) + fJ(k+l)' {)2 (118(1-1) 112) (1 + fJ(k)' - IIg(l-I) 112 IIg(l-I) 112

    + fJ(k)' fJ(k-l)' + ... + fJ(k)' /3(k-I)' .. /3(1)')

    = {)2 (liS (1-1) 112) (1 + fJ(k+ I )' + fJ(k+l)'fJ(k)' IIg(l-I) 112

    + ... + fJ(k+I)' fJ(k)' ... fJ(I),). (3.3.22c)

    From (3.3.22b) and (3.3.22c), by induction, the proposition is verified. I

  • Chapter 9: Generalized Polak-Ribiere Algorithm 38

    We now consider the assumption:

    [AP-7] There exists 8 > 0 such that Vk ~ 1,

    (3.3.23)

    We remark that the search direction s(k) in the GPR algorithm is independent

    of the length of the auxiliary vector S

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    M IIg(k) IIIIS b - p8 - ,

    where b > 1 for 8 sufficiently small.

    On the other hand, (3.2.14c) gives

    IId(k-l) 11 ~ A => a(k-l) liP-I) 11 ~ A

    => IIs(k-I)1I ~ ~ a

    for k > 1, where 0 < a ~ a(k) exists by Proposition 3.18. obtain, from (3.3.27a), using (3.3.20), (3.3.23) and (3.3.28a),

    1.B(k) I ~ M{)A 1 p82 a = b

    if, by (3.3.27b),

    39

    (3.3.27a)

    (3.3.27b)

    (3.3.28a)

    So, in this case, we

    (3.3.28b)

    (3.3.28c)

    The above Proposition shows that the GPR algorithm shares the "Property

    (*)" of Gilbert and Nocedal[G3] under certain conditions which are not too restrictive. The next proposition, adapted from Gilbert and Nocedal[G3], shows

    that if, in addition, some restriction on the step sizes in the GPR algorithm is

    imposed, then IIs(k) 112 can grow at most linearly.

    Proposition 3_25. Suppose that in the GPR algorithm, g(k) t- 0 for all k and that the conditions lAP-9aJ and lAP-9bJ are satisfied. Then if

    [AP-I0] For any A > 0, there exist integers / > 1 and r ~ 1 such that

    for any index k ~ /, the number of indices i E k, k + r - 1 for whichlld(i-I)1I > A does not exceed j,

    then IIs(k)1I2 ~ c(k -/ + 2) for k ~ I, where c > 0 depends on 1 but not on k.

    Proof_ For A > 0 satisfying [AP-9b], consider integers 1 > 1 and r ~ 1 given by

  • Chapter 9: Generalized Polak-Ribiere Algorithm 40

    [AP-IO]. By Proposition 3.23, we have, for k > I,

    IIs(k) 112 ~ c(I + /J 0 depends on I but not on k.

    Consider the product

    (3.3.30a)

    of (k - i + 1) factors of the form p(t)2, where i ~ t < k and I ~ i S k.

    If k - i + 1 ~ T, then we have, by [AP-9al,

    (3.3.30b)

    If k - i + 1 > r, let k - i + 1 = mT + h, where m ~ 1 and 0 ~ h < T and rewrite p(i) by grouping consecutive T factors from the beginning:

    p(i) = p(i) p(i) ... p(i) Q(i) o 1 m_1 ' (3.3.30c)

    where

    p(i) , ~ p(k,)2 p(k, _1)2 . P(k'+l +1)2 (3.3.3Ia)

    k, ~ k - tT, o ~ t ~ m-I, (3.3.3Ib) and

    Q(i) ~ fJ(km)2 fJ(km _1)2 . p(i)2, (3.3.3Ic)

    km ~ k-mT, (3.3.3Id)

    there being T factors in each p,(i) and h factors in Q(i) (Q(i) = 1 if h = 0).

    Let p!i) be the nwnber of indices j E k'+l + 1, k, such that IldU-I)1I > A. By [AP-lO],

    P( i) < ~ (3 3 32 ) , - 2 .. a

  • Chapter 9: Generalized PolakRibiere Algorithm 41

    and hence, by IAP9a] and IAP9b],

    F) p(i) < W)P~') (:2) r-p, ,

    -(b~ ) r-2p~')

    :5 1 (3.3.32b)

    in view of (3.3.32a) and b > 1. Moreover, by IAP9a],

    (3.3.32c)

    Thus, from (3.3.30b), (3.3.30c), (3.3.32b) and (3.3.32c), it follows that

    for each I :5 i :5 k and hence from (3.3.29), we obtain, for k ~ I,

    asb>1. I

    IIP )1I 2 :5 c(1+ b2r (k -I + 1))

    :5 cb2r(k-I+2) (3.3.33)

    From the above discussion, we then have the following convergence result:

    Theorem 3.26. Suppose that conditions [APl] and [AP.2] are satisfied and the

    GPR algorithm is executed with line searches satisfying (3.3.3) for some p > o. Then if g(k) i 0 for all k and conditions [Ap.ga], [AP-9b] and [APlO] hold, then

    Proof. This follows directly from Proposition 3.25 and the preceding discussion.

    I

    Of course, in view of Proposition 3.24, we can replace the conditions IAP9aJ

    and IAP9b] by conditions IAP.7] and lAP-S] in the above theorem.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 42

    . 3.4 Rate of Convergence

    In this section, we analyse the rate of convergence of the GPR algorithm for

    solving the problem [P] under assumptions [AP-1]-[AP-3] (as stated on page 15 and page 23) and the following additional assumption:

    [AP-Il] For x(l) as in [AP-2], the level set L(x(l is convex and

    3B > 0 :3 \Ix', x" E L(X(I,

    IIG(x') - G(x")1I ::; B IIx' - xliii (3.4.1 )

    We also assume that the GPR algorithm, initiated at X(l) satisfying [AP-2], is

    executed with exact line search at each iteration. Our approach follows that in

    Cohen[C2].

    It may be remarked that in case of a quadratic objective function, the GPR algorithm with exact line search terminates at the optimal point in at most n

    iterations. If the' objective function is non-quadratic, then finite termination does not occur in general. However, as we shall see, with exact line search,

    the algorithm possesses n-$tep quadratic convergence when reinitialized with

    a steepest descent direction.

    We only consider the case when the GPR algorithm is reinitialized.

    Let 4> denote the GPR algorithm applied to the general function f with exact line search at each step and described by (3.1.15) with s(k) = s(k).

    For each reinitialized point x(k) constructed by 4>, let F(k) : Rn -+ R be the

    quadratic function defined by

    F(k)(x) ~ f(k) + g(k)T(X _ x(k + Hx _ x(kTG(k)(x _ x(k. (3.4.2)

    Suppose that 4> F(.) denotes the GPR algorithm applied to F(k) starting at x(k) and constructing the itE;rates x(k) along directions s(k) at x(k), where ,+1 .,

    S~k) = s(k), (3.4.3)

    s~k) = _ g(k) + a(k) s(k) for i > 1, I I fJ, .-1

  • Chapter 9: Generalized Polak-Ribiere Algorithm 43

    with g?) = V F(k)(x~k and the a~k)'s determined by exact line search (Here, we

    use subscripts i to denote the iterates for the

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    Lemma 3.29. For I ::?: 0,

    Proof. The lemma is trivially true for I = 0.

    For k ::?: 1 and I ::?: 1, we have,

    I-I

    IIdk+l) - d k)1I ~ L IIG(k+i+ I ) - dk+iJlI j=o

    But from [AP-H], Proposition 3.11 and Lemma 3.27,

    IIG(k+i+I ) _ G(k+j) 11 = IIG(x(k+i+ I )) - G(x(k+j))11

    .~ B lIa(k+iJ s(k+iJ 11

    ~ B IIs(k+j)1I m

    =OIlP)II Hence,

    for alII::?: 0.

    Lemma 3.30. For I ::?: 0,

    where C(k) is as defined by (3.2.8).

    Proof. For k ::?: 1 and 1 ::?: 0, we have,

    But, by (3.2.8), [AP-11] and (3.2.23a),

    44

    (3.4.6)

    (3.4.7)

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    116(HI) - G(HI) 11 = 11 il{ G(x(HI) + t a(HI) P+l) - G(k+ l) }dtll

    ::; iiIG(x(k+l) + t a(HI) s(HI) - G(k+l) IIdt

    ::; 2-: IIP+I)II. Hence, by Lemmas 3.27 and 3.29, we have,

    for all I ~ O. I

    Lemma 3.31. For 0 ::; I ::; n - 1,

    IIg(k+l+l) _ g,~~ 11 = O( IIg(HI) - g~:! 11)

    45

    + O(lIa(HI) P+l) - a~!~ s~!~ 11)

    + O(IIP)1I2 ). (3.4.8)

    Proof. For k ~ 1 and 0 ::; I ::; n - 1, we have, by (3.2.7), (3.2.18) and

    (3.2.23a),

    IIg(Hl+l) _ g~!~1I = Ilg(k+l) + a(k+1)6(HI)s(k+I) - g~:~ _ a~!~G(k)s~!~ 11

    ::; IIg(HI) _ g~:~ 11 + 116(HI)( a(HI) p+l) - a~:~ s~!~)1I

    + lIa~!~ (6(HI) _ G(k)s~!~ 11

    ::; IIg(HI) _ g~!~ 11 + Mlla(HI) s(k+l) - a~!~ s~!~ 11

    + 2. 116(HI) _ G(k) 11 liP) 11. ~ '+1 Hence, using Lemmas 3.28 and 3.30, we conclude that

    IIg(HI+l) _ g~:!1I = O(lIg(k+l) _ g~!~II)

    forO::;I::;n-l. I

    + O( lIa(HI) p+l) - a~!~ s~!~ 11)

    + O(lIs(k)1I2 )

  • Chapter 3: Generalized Polak-Ribiere Algorithm 46

    Lemma 3.32. For 0 ::; I < n - 1,

    IIP+I+l) - s~!~ 11 = O(lIs(HI) - s~!~ 11)

    + O(lIg(Hl+l) _ g~!! 11) + O(IIP) 11 2 ). (3.4.9)

    Proof. For 0 ::; I < n - 1, we have, from (3.1.15) and (3.4.3),

    IIs(HI+1) _ s~!~ 11 ::; IIg

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    ~ f-{M2I1g(HI+I)IIIIP+I) - s~:! IIlIs~~! 112118(HI)11 .+,

    + M2I1g(k+I+I)lIlIs(HI) _ s~:! IIlIs~:! 11 2118(HI)1I

    + M 2 I1g(k+l+l) _ g~!~ IIlIs~~! 11211P+1) 112

    + M2I1g~!~ IIlIs(k+l) - s~:! IIlIs~:! 11 IIs(k+I) 112

    + MlIg~!! IIIIG(HI+I) - C(k) IIlIs~~! IIlIs(HI) 11 3

    + MlIg~!! IIIIG(HI+I) - G(k) IIlIs~~! IIlIs(k+l) 11 3

    + M2I1g~!~ IIIIP+I) - s~:! IIlIs~:! 11 IIS(k+I) 112 }

    ~ c.1+, {M2( 1 + ~) IIs(HI) - s~~! IIlIs~~! 112I1s(HI) 112

    + M2( 1 + ~) IIP+I) - s~~! IIlIs~~! 112I1P+1) 112

    + M21Ig(HI+l) _ g~!! IllIs~~! 112I1s(HI) 112

    + M2(1 + ~) IIP+I) - s~~! IIlls~~! 112I1s(HI) 112

    + M (1 + ~) IIC(HI+I) _ G(k) IIlls~!! 11 211P+1) 11 3 + M (1 + M) IIC(H1+I) _ C

  • Chapter 9: Generalized Polak-Ribiere Algorithm 48

    since, from (3.4_10b), using (3.2.18),

    c>+, ~ m 2 I1P+l) 112 IIs~:~ 112. Hence, from (3.4.lOa) and (3.4.11), using Le=as 3.27 and 3.29, it follows that

    IIP+l+l) - s~:~ 11 = O(lIs(Hl) - s~:~ ID + O(lIg(Hl+l) - g~:W + O(IIP)1I2 )

    for 0 :::; I < n - 1.

    Lemma 3.33. For 0 :::; I < n - 1, lIa(k+I)P+l) _ a~:~s~:~1I = O(lIg(Hl) - g~:~II)

    + O(IIP+l) - s~:~ 11) + O(lIs(k)1I2 ).

    Proof. For k ~ 1 and 0 :::; I :::; n - 1, we have, by (3.2.22),

    where

    Il a(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1 IIg(Hl) 112 s(Hl)

    = s(Hl)T6(k+l)s(Hl)

    c = (s(Hl)T 6(Hl) s(Hl (s(k)T G(k) s(k "'+, '+1 '+1

    and 6(k) is as defined by (3.2.8), and hence

    Ila(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1

    :::; / {lIg(Hl)T(g(HI) _ g~:~)(S~:~TG(k)s~:~)P+l)1I >+,

    + IIg

  • Chapter 9: Generalized Polak-Ribiere Algorithm 49

    < 1 - m2I1s(HI) 112 lis!!! 112

    {MlIs~!~ 11211~(k+l) IIlIg(k+l) IIlIg(HI) - g~:~ 11

    + MIIP+I) - s~!~ IIlIs~!! III1P+1) IIlIg(HI) IIlIg!!~ 11

    + MIIP+I) 112I1s~!! IIlIg!!~ IIl1g(HI) - g~!~ 11

    + MlIs(HI) 112I1s(HI) - s~!~ IIlIg~:~ W

    + MIIP+I)1I 2 I1s(HI) - s~!~ IIlIg!!~ W

    + IIC(HI) _ G(k) IIIIP+I) 112I1s~!~ IIl1g~!~ 112 }

    ~ ~2 { 2MlIg(HI) - g~!~ 11 + 3MIIP+I) - s~!~ 11

    + IIC(HI) - G(k) IIlIs~!~ II},

    using (3.2.18) and (3.2.14b). Hence, using Lemmas 3.28 and 3.30,

    lIa(HI) PH) - a~!~ s~!~ 11 = O(lIg(HI) - g~!~ 11)

    for 0 ~ I ~ n - 1. I

    Lemma 3.34. For 0 ~ I ~ n - 1,

    + O( IIs(HI) - s~!! ID

    + O(lIs(k) 11 2 ).

    (i) IIg(HI) - g~:~1I = O(lls(k)W) (ii) IIs(HI) - s~!~ 11 = O(IIP) 112) (iii) lIa(HI)P+1) - a~!~s~!~1I = O(IIP)1I2)

    (3.4.15)

    (3.4.16)

    (3.4.17)

    (3.4.18)

    Proof. We prove (3.4.16), (3.4.17) and (3.4.18) simultaneously by finite

    induction on lEO, n - 1.

  • Chapter 9: Generalized Polak-Ribiere Algorithm 50

    For I = 0, (3.4.16) and (3.4.17) are trivially true, since, by definition of g~k) and s(k)

    1 '

    (3.4.19a)

    (3.4.19b)

    That (3.4.18) is also true for I = 0 follows from Lemma 3.33 using (3.4.19a) and

    (3.4.19b ).

    Assume now that (3.4.16), (3.4.17) and (3.4.18) are true for some 0 :::; I < n-1.

    It follows that

    by Lemmas 3.31 and 3.33 and the induction hypothesis.

    Also,

    by lemmas 3.32 and 3.31 and the induction hypothesis.

    Moreover, since 1+ 1 :::; n - 1, we have, from Lemma 3.33,

    IIC.,

  • Chapter 3: Generalized Polak-Ribiere Algorithm 51

    Proposition 3.35. H the iterates x(k), generated by the GPR algorithm, con~ verge to a local solution x' of [P], then at each iteration,

    (3.4.21 )

    Proof. Applying the mean value theorem to g(.), we obtain,

    (3.4.22)

    for some e(k) E (x(k),x'). As the sequence {x(k)} in the compact set L(X(I)

    converges to x', we have, x' E L(x(l) and hence, e(k) E L(x(!) because L(x(!)

    is convex by [AP-ll]. Moreover, since x' is a local minimizer of j, g(x') = O.

    Thus, from (3.4.22), we obtain, using (3.0.2),

    We now consider the n-step quadratic convergence result for the GPR algo-

    rithm applied to the problem [P] under conditions stated in the first paragraph of this section.

    Notice that the stationary point x' of the quadratic p(k) is given by F(')

    'i1 F(k)(x' ) = 0 FC") ,

    (3.4.23)

    that is,

    (3.4.24)

    where t/lNR is the Newton-Raphson algorithm applied to j. Since the Newton-Raphson algorithm t/lNR is quadratically convergent to the stationary point x' of j, we get

    IIx' - x'lI = IIt/lNR(X(k) - x'lI F(k)

    = O(lIx(k) - x'1I2). (3.4.25)

  • Chapter 9: Generalized Polak-Ribiere Algorithm 52

    Also, using the fact that the GPR algorithm tPF(') reaches the minimum

    point x' of the quadratic F(k) in at most n iterations, we have, F(t)

    I Notice that

    _ -I. (k)) x F(') - 'I' F(') X

    _ (k) -x . n+l

    Hence, using Lemma 3.34, we conclude that

    n-l

    Il x(k+n) - x 11 = ~ Ila(k+I) s(k+l) - a(k) s(k) 11 F(") L.J '+1 '+1 1=0

    - ~"

    (3.4.26)

    (3.4.27)

    We assume that the GPR algorithm is restarted every t iterations (t 2: n) with the steepest descent direction. In this case, s(kt) will be set equal to _g(kt)

    every t iterations, and all lemmas considered previously in this section hold for k an integral multiple of t. The next theorem gives the n-step quadratic convergence result for the GPR algorithm applied to [P] with reinitialization every t iterations when an exact line search is adopted at eaclJ step.

    Theorem 3.36. For the sequence {x(k)} generated by the GPR algorithm

    restarted every t steps with the steepest descent direction,

    -.- IIx(kt+n) - x'lI < hm 11 (kt) '11 2 _ C < 00 k-oo X - X

    (3.4.28)

    for some constant C, where x is a minimum point of f on Rn.

    Proof. For such k an integral multiple of~, we obtain as in (3.4.25) and (3.4.27),

    (3.4.29)

  • Chapter 9: Generalized Polak-Ribiere Algorithm

    and

    Because s(kt) = _g(kt), (3.4.30) becomes

    IIX(kt+n) _ x;('1) 11 = O(lIg(kt)1I 2 )

    = O(lIx(kt) _ x"112)

    due to Proposition 3.35. So, from (3.4.29) and (3.4.31), we obtain,

    IIX(kt+n

    ) _ x"1I < IIx(kt+n) - x;(") 11 + IIx;(,,) - x"1I

    = O(lIx(kt) _ x"1I2)

    and this completes the proof of (3.4.28). I

    3.5 Characteristic Behaviour and Basic Algorithm

    53

    (3.4.30)

    (3.4.31 )

    In this section, we discuss the characteristic behaviour of the GPR algorithm

    and formulate an implement able version of it. This is almost identical to

    the traditional CG algorithm, differing only in the computation of the search

    directions.

    We first compare the search directions of the GPR algorithm with those of

    the generalized CG method of Liu and Storey[L2] as described in (2.3.18) and

    of the memoryless BFGS-QN (abbreviated as MQN) method of Shanno[S4]. To

    distinguish, we denote these by s~kJR' s~~) and S~~N respectively.

    As in (2.3.18),

    (

    (k)TG(k) (k) (k) (k) (k-l) 9 9

    SLS = - (g , S ) (k)TG(k) (k-l) 9 . S

    (3.5.1 )

    and hence, using (3.1.15c) and (3.1.16), we have,

  • Chapter 9: Generalized Polak-Ribiere Algorithm 54

    s~~) = ~ { _ (s(k-l)T G(k)s(k-lg(k) + (g(k)T G(k)s(k-lS

  • Chapter 9: Generalized Polak-Ribiere Algorithm 55

    where

    (3.5.7b)

    Now, if we choose, in particular, e(k-I) = (l/y(k-I)Ty (k-I), then (3.5.7)

    immediately reduces to the MQN search direction:

    with

    -(k).o. _ d(k-l)y(k-I)T ( y(k_I)T y(k-I) d(k-I)d 1 by the backward finite-difference formula

    (k) -(k-I) (k) ;;(k-I) _ 9 - 9

    G s - 8 (3.5.10)

    with g(k-J) .0. g(x(k) -8 s(k-I) and 8 any suitable positive small number, then the

    Hessian matrix G(k) itself need not be computed or stored. This way of avoiding

  • Chapter 9: Generalized Polak-Ribiere Algorithm 56

    matrix computation and storage can result in significant savings on large-scale

    problems, and may be essential when it is not possible to store G(k). The linear

    algebra required to obtain the product on the left hand side of (3.5.10) is also

    reduced. On the other hand, this gain is obtained at the expense of one additional

    gradient evaluation per iteration. Now considering (3.1.16) and (3.5.10), (3.1.15d)

    becomes for k > 1,

    (3.5.11)

    An implement able algorithm for the GPR algorithm with natural restart and

    based on the search direction given in (3.1.15b) is stated below:

    Algorithm: GPRl

    Step 1. Let x(1) be an estimate of a minimizer x of f.

    Step 2. Set k = 1 and compute s(1) = _g(1).

    Step 3. Line search: compute X(k+l) = x(k) + a(k)ik ) and then compute g(k+l).

    Step 4. IT IIg(k+l)1I < E, take x(k+l) as x and stop. Otherwise go to Step 5.

    Step 5. IT k + 1 > n > 2, then go to Step 11. Otherwise to to Step 6.

    Step 6_ Compute

    Step 7. With 6 = min(1, .,Jry/v's(W s(k), compute

    g(k) = g( x(k+l) _ 6s(k).

    Step 8. Compute

  • Chapter 9: generalized Polak-Ribiere Algorithm 57

    Step 9. Compute new search direction

    Step 10. Set k = k + 1 and go to Step 3.

    Step 11. Set x(k+l)toX(l) and repeat Step 2 onwards.

    To implement the GPR algorithm requires approximately 5n + 3 double-precision words of working storage locations and O(n) operations per iteration.

  • CHAPTER 4 SOME MODIFICATIONS OF THE GPR ALGORITHM AND THEIR IMPLEMENTATIONS

    In this Chapter, we consider several modifications of the GPR algorithm to

    try to improve its convergence properties and computational efficiency and discuss

    their implementations for solving the general nonlinear problem

    [P] Minimize J(x), x ERn.

    We also discuss the theoretical and algorithmic behaviour of these modified

    algorithms.

    As in the previous chapter, we assume that the objective function J satisfies the basic conditions [AP-I] and [AP-2] of that chapter and that the GPR

    algorithms are initiated at x(!) satisfying [AP-2]. Other conditions will be added

    whenever they are required.

    A CG algorithm with exact line search finds the minimum of a convex

    quadratic function of n variables in at most n iterations. Such an algorithm

    with periodical restart is often globally convergent as well as n-step quadratically

    convergent when the line search is taken to be asymptotically exact (Luksan[L3]

    and Baptist and Stoer[B6]). Nevertheless, without regular restart, the PR

    algorithm can cycle infinitely without approaching an optimal point or can

    sometimes slow down away from the optimal point (Powell[P4, P2, P5]), because

    a very small step IIx(k+1) - x(k)1I is taken at each iteration. The GPR algorithm

    is approximately the same as the PR algorithm when working with an exact line

    search and we have noticed that, in general, it may require a very large number

    of iterations to approach the solution point unless a restart is made occasionally

    with the steepest descent direction. So, we propose some restarting strategies for

  • Chapter 4: Some modification.s 0/ the GPR Algorithm 59

    the GPR algorithm which will hopefully improve its computational efficiency and

    CPU time.

    4.1 GPR Algorithm with Non-negative Beta

    Powell[P4] has shown that there are functions / satisfying conditions [AP-l]

    and [AP-2] for which the PR algorithm, even with exact line search and exact

    arithmetic, generates gradients which stay bounded away from zero. Powell's

    example requires that some consecutive search directions become almost opposite,

    and as this can only occur, in case of exact line search, when p~':2 < 0, so Powell[P5] suggests a new implementation of the PR algorithm with a non-

    negative value for p~':2 taken at each iteration. Motivated by Powell's suggestion

    and the fact that P~kjR ~ p~':2 when working with exact line search, we propose an implementation of the GPR algorithm with P~kjR ~ 0 to prevent cycling. This

    modified algorithm will be called the GPR Algorithm with Non-negative Beta

    (GPR+ Algorithm, in short).

    The search directions in the GPR+ algorithm, obtained from (3.1.15), are

    where p(k) is given by GPR+

    p(k) = max{p(k) ,O} GPR+ GPR

    for k = 1, for k > 1, (4.1.1)

    (4.1.2)

    on all iterations, where P~~R is given by (3.1.15d). Thus, the GPR + search directions are given by

    S(k) = { s~jR' if P~kjR > 0, GPR+ _g(k), otherwise

    ( 4.1.3)

    with k ~ 1. It follows that the GPR + algorithm has the "automatic" restarting procedure depending on the values of P~kjR.

    Since (4.1.1) and (4.1.3) are equivalent, it is immaterial which particular

    form is used to describe the GPR+ search directions, and we shall use (4.1.3) in

    our implementation.

  • Chapter 4: Some modificatiom of the GPR Algorithm 60

    It is easy to see that the GPR + algorithm preserves the descent property

    (3.1.17) at each iteration. Moreover, it follows from the construction of s(k) GPR+

    that the GPR+ algorithm inherits all the properties of the GPR algorithm.

    We now formalize a convergence theorem for the GPR + algorithm. For

    convenience of notation, we continue to use S 0, k-oo

    00 L lI u (k) - u(k-I)1I2 < 00, k=2

    Proof. By (4.1.5), 3 > 0 3

    (4.1.5)

    (4.1.6)

    (4.1.7a)

    where f) is as defined in (3.3.20). Moreover, as in Proposition 3.24, 3 b > 1 3

    (4.1.7b)

    for all k > 1.

  • Chapter ./: Some modification3 of the GPR Algorithm

    From (3.1.15b) and (3.1.15c), we obtain,

    P) = - (1 + rP)q.)g(k) + rP)P-l)

    and hence

    for all k > 1, where

    "" g(k)TS(k-l)

    q. - g(k)Tg(k) ,

    r(k) "" (1 + fJ 1,

    and hence,

    (i) r(k)\(k) = IIr(k) 112 +6.r(k)Tu (k-l),

    (ii) 1 = U(k)T u(k)

    = IIr(k)1I2 + 20. r(k)T u(k-l) + o~,

    (iii) u(k) _ u(k-l) = r(k) + (0. _ l)u(k-l)

    lI u(k) _ U(k-l) 112 = IIr(k) 112 + 2(0. _ l)r(k)T u(k-l) + (0. _1)2

    = 2(1 - o. _ r(k)T u(k-l)

    = 2(1- o~ - (1 + o.)r(k)Tu (k-l)/(1 +6.) = 2( r(k)T u(k) _ r(k)T u(k-l)/(1 + 6.)

    = 2r(k)T(u(k) _ u(k-l)/(1 + 0.)

    ~ 2I1r(k)lIl1u(k) - u(k-l)II/(1 + 0.),

    61

    (4.1.8)

    (4.1.9)

    (4.1.10a)

    (4.1.10b)

    (4.1.10c)

    (4.1.11)

    (4.1.12a)

    (4.1.12b)

    .< 4.1.12c)

  • Chapter 4: Some modification3 of the GPR Algorithm

    using the Cauchy-Schwarz inequality. Hence,

    IIU(k) - u(k-l)1I ~ 2I1r(k)lIf(1 + 6.)

    ~ 2I1r(k) 11,

    since 6. ~ O. But, from (4.1.10b),

    IIr(k)1I = 11+ P(k)q.lllik)lIflls(k)1I

    ~ cllg(k) IlfllP) 11

    62

    (4.1.13)

    (4.1.14)

    for some c ~ 1, since, by (4.1.7a), (4.1.7b), (4.1.10a), (3.1.17) and (3.1.20) or

    (3.3.2), we have,

    From (4.1.13), (4.1.14) and (3.2.15), it then follows that V k > 1,

    where e(k) ~ (- g(k)'s(k). So, if (4.1.6) fails, then

    00 L cos2 e(k) = 00 k=l

    (4.1.15)

    (4.1.16)

    and hence, by Theorem 3.19, the assumption (4.1.5) fails. This completes the

    proof. I

    Theorem 4.2. Suppose that conditions {AP-7} and {AP-S} hold in addition to

    the conditions stated previously. Then the limit

    lim IIg(k) 11 = 0 (4.1.17) k-oo

    is achieved by the GPR + algorithm.

    Proof. First suppose that the condition [AP-10] is satisfied.

  • Chapter 4: Some modificatioTt-' of the GPR Algorithm 63

    We see, from Proposition 3.24, that conditions [AP-9a] and [AP-9b] are

    satisfied. So, by Proposition 3.25, there exist an integer I > 1 and a constant c> 0 (depending on I) such that for k ~ I,

    (4.1.18a)

    It follows that

    (4.1.18b)

    and hence (4.1.17) is achieved. For, if not, then 3f > 0 3 V k ~ 1, IIg(k)1I 2: f and so, since by (3.2.15),

    we have, 00 L cos2 e(k) IIg(k) 112 = 00 ( 4.1.18c)

    k=l

    contradicting the Zoutendijk condition (3.3.18a).

    We now consider the case when [AP-I0] is not satisfied. In this case:

    [AP-I0]* There exists>. > 0 such that for all integers I > 1 and T ~ 1,

    there exists an integer k ~ I such that the number of indices

    i E k, k + T - 1 for which IId(i-l) 11 > >. is greater than t.

    Assume that fun IIg(k)1I of O. (4.1.19) k-oo

    The sequence {x(k)} being bounded, 3B > 0 3 IIX(k) 11 ::; B for k ~ 1. With >. as in [AP-I0]*, define the integer T ~ 1 by

    8B 8B T ::; T < T + 1. (4.1.20a)

    By Proposition 4.1, in view of (4.1.19), we have (4.1.6) and hence, with T as

    above, there exists an integer I > 1 such that

    (4.1.20b)

  • Chapter 4: Some modification" of the GPR Algorithm 64

    If we select k ~ I as in [AP-lOr, then, since

    k+r-l

    = L IIJi-l)lIu(i-l) i=k

    we have,

    k+r-l = x(k+r-l) _ x(k-l) _ L IIJi-l) 11 (u(i-l) _ u(k-l).

    i=k

    Hence, taking norms,

    k+r-l k+r-l L IId(i-l)1I ::; 2B + L IIJi-l)lIlIu(i-l) - u(k-l)lI ( 4.1.20c) i=k

    But for i E k,k + r -1, using the Cauchy-Schwarz inequality and (4.1.20b),

    i-I

    lIu(i-l) - u(k-l)1I ::; L lIu(j) - u(j-l)11 j=k

    ::; r1 Ur) 1 1

    =2

    and hence, from (4.1.20c), we obtain,

  • Chapter.4: Some modification.! of the GPR Algorithm

    = ~ L II .. P-')II + ~ L lIct 1 L lIct,\

    > .!..A~ 2 2 .AT

    ="4'

    in view of [AP-10t. Thus, we have,

    SB T < A'

    65

    contradicting (4.1.20a). This contradiction leads to the denial of (4.1.19) and so

    (4.1.17) is achieved. I

    For implementing the GPR + algorithm, we have the following modified

    version of the Algorithm GPR1:

    Algorithm: GPR2

    This is the same as Algorithm GPR1 except that an additional step, namely

    Step Sa, is inserted in between Step S and Step 9:

    Step Sa. If ,8~~tl) > 0, then go to Step 9. Otherwise go to Step 11.

    4.2 GPR Algorithm with Powell Restart

    Regarding the GPR + algorithm, we notice from (3.2.10a) that, when working with exact line searches a(k) > 0 iff g(k)T(g(k) - g(k-I) > 0 that is iff , ,vGPR - -, ,

    (4.2.1)

    So, the GPR+ algorithm with exact line search induces a restart in the steepest

    descent direction whenever (4.2.1) is violated, that is, when

    ( 4.2.2)

  • Chapter 4: Some modification~ of the GPR Algorithm 66

    The condition (4.2.2) is a less restrictive restarting criterion than the Powell

    restarting criterion (Powell[P2])

    (4.2.3)

    Even though, Powell's criterion (4.2.3) was designed to ensure the conver-

    gence of Beale's restarting algorithm (Powell[P2]), we consider its use with the

    GPR algorithm in the hope of improving efficiency and convergence. The resulting

    algorithm, called the Powell Restarting GPR Algorithm (PGPR Algorithm, in

    short) is just as in (3.1.15) except that f3;~PR = f3~~R if

    ( 4.2.4)

    d a(k) an ,vPGPR = 0 if (4.2.3) occurs.

    Moreover, we find, in view of (3.1.15), that the PGPR search direction s~~PR can be written as

    s(k) _ SGPR' {

    (k)

    POPR - _g(k),

    for k :2: 1, and so the descent property

    if (4.2.4) holds, if (4.2.3) holds

    (4.2.5)

    (4.2.6)

    is satisfied at all iterations. It also follows that all the standard properties

    of the GPR algorithm apply directly to the PGPR algorithm under the same

    assumptions as those imposed on the GPR algorithm. For instance, we can see

    that the global convergence results, stated in Theorem 3.19 and Theorem 3.20,

    hold for the PGPR algorithm provided we suppose that g(k) 'f 0 for all k and that the line search is taken satisfying (3.3.3) with some p > O.

    It may be observed from (2.3.15), (3.2.10a) and (4.2.4) that in implementing

    the PGPR algorithm with exact line search, a restart is not induced if

    la(k) _ a(k)1 < 0.2 a(k) ,v GPR ,vFR - fJFR ' (4.2.7a)

    that is, if

    (4.2.7b)

  • Chapter 4: Some modification.! of the GPR Algorithm 67

    Thus in such implementations of the PGPR algorithm, satisfaction of (4.2. 7b), is

    a measure of adequacy of ,8~':?R'

    Since the gradients are orthogonal when the GPR algorithm is applied to

    a quadratic function q(-) and exact line searches are performed (see Proposition 3.3), and since (4.2.3) decides whether enough orthogonality between g(k-I) and

    g(k) has been lost to warrant a restart, it is necessary for the implementation of

    the PGPR algorithm on a quadratic function that the line searches be almost

    exact at all iterations. This means that the line search at each iteration must

    perform at least one cubic interpolation, which is, in fact, very expensive in terms

    of function and gradient evaluations.

    To implement the PGPR algorithm, we modify the Algorithm GPRl as

    described below:

    Algorithm: GPR3

    Just add the following new step, namely Step 5a, to Algorithm GPRl in

    between Step 5 and Step 6.

    Step 5a. If Ig(k+I)T g(k) I > 0.2I1g(k+l) 11 2 , then go to Step 11. Otherwise go to Step 6.

    4.3 Shanno's Angle-Test Restarting GPR Algorithm

    Shanno's angle-test restart procedure (Shanno[S2]) for CG algorithms sets

    up a switching criterion for restarting such algorithms with a steepest descent

    direction when the cosine of the angle between the search direction and the

    negative gradient is greater than a constant multiple of the cosine of the angle

    between the FR search direction and the negative gradient. It thereby assures

    the convergence of the modified algorithm, as the FR algorithm is globally con-

    vergent. We propose, therefore, an implementation of the GPR algorithm which

    incorporates the angle-test restart. This new implementation will be referred

    to as the Shanno's Angle-Test Restarting GPR Algorithm (SGPR Algorithm, in

    short).

    Shanno's procedure is based on consideration of the FR process with exact

    line search (so that g(k)TS~~-I) = 0). Thus we have,

  • Chapter 4: Some modificatioTl-' of the GPR Algorithm

    for k ~ 1 and

    IIg(k) 112 IIs~~1I2

    for k > 1, where e~~ "" (- g(W s~~). From (4.3.2), it follows that Vk > 1,

    IIs~~ 112 = IIg(k) 112 + .B~~2I1g(k-1) 112 + .B~~2 .B~~-I)2I1g(k-2) 112

    + ... + .B~~2 .B~~-1)2 . .B~~2I1g(l) 112

    Hence, using (2.3.15), we obtain,

    k

    = IIg(k) 1142: IIg(l) 11-2 1=1

    and so, from(4.3.1), we have

    cos2e(k) = ___ ..:1 __ _ FR k

    IIg(k) 1122: IIg(l)U-2 1=1

    Assuming that g(k) :f 0 for all k, define

    ,),(k)2 ~ ---k,.:..T __ -

    IIg(k) 112 2: IIg(1) I( 1=1

    with T > o. We now use the test

    cos2e(k) (k)2 GPR ~ ')' ,

    68

    (4.3.1)

    (4.3.2)

    (4.3.3)

    (