A theoretical and computational investigationof a ... · GENERALIZED POLAK-RIBIERE • ALGORITHM...

Loughborough UniversityInstitutional Repository

A theoretical andcomputational investigation

of a generalizedPolak-Ribiere algorithm forunconstrained optimization

This item was submitted to Loughborough University's Institutional Repositoryby the/an author.

Additional Information:

A Doctoral Thesis. Submitted in partial fulfilment of the requirementsfor the award of Doctor of Philosophy of Loughborough University.

Metadata Record: https://dspace.lboro.ac.uk/2134/13193

Publisher: c K.M. Khoda

Please cite the published version.

https://dspace.lboro.ac.uk/2134/13193

This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository

(https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.

For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/

LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY

LIBRARY AUTHOR/FILING TITLE !

, i.1Ho])A \

A THEORETICAL AND COMPUTATIONAL

INVESTIGATION OF A GENERALIZED POLAK-RIBIERE ALGORITHM

FOR UNCONSTRAINED OPTIMIZATION

by

KHAN MONZOOR-E-KHODA

A Doctoral Thesis

Submitted in partial fulfilment of the requirements

for the award of the degree of

Doctor of Philosophy

of the Loughborough University of Technology

February, 1992

Supervisor: Professor C. Storey

Department of Mathematical Sciences

@ by K. M. Khoda, 1992

Loughborough Unrverslty Of Technoi"oy library

-- _o..kr...:.-.J..'i '-=------t

, . ,- O%()()o"1~1

w'1

This Thesis is dedicated to my Father and Mother

as a token of my grateful appreciation

TABLE OF CONTENTS

ACKNOWLEDGEMENTS vii

SUMMARY OF THE THESIS viii

CHAPTER 1 : INTRODUCTION 1

1.1 General Nature of Optimization 1

1.2 Unconstrained Optimization 2

1.3 Scope and Organization of The Thesis 4

CHAPTER 2 : MATHEMATICAL FOUNDATIONS 5

2.1 Notation 5

2.2 Background Material 7

2.3 Gradient Methods of Minimization 7

2.4 Line Search Strategies 12

CHAPTER 3 : GENERALIZED POLAK-RIBIERE ALGORITHM 15

3.1 Derivation of the Algorithm 16

3.2 General Properties of the Algorithm 19

3.3 Global Convergence Properties of the Algorithm 26

3.4 Rate of Convergence 42

3.5 Characteristic Behaviour and Basic Algorithm 53

CHAPTER 4 : SOME MODIFICATIONS OF THE GPR

ALGORITHM AND THEIR IMPLEMENTATIONS 58

4.1 GPR Algorithm with Non-negative Beta 59

IV

Table of Content.!

4.2 GPR Algorithm with Powell Restart

4.3 Shanno's Angle-test Restarting GPR Algorithm

4.4 Efficiently Restarting GPR Algorithm

4.5 Concluding Remarks

65

67

71

76

CHAPTER 5 : MULTI-TERM RESTARTING GPR ALGORITHMS 77

5.1 Beale Three-Term Restarting GPR Algorithm 78

5.2 Nazareth Three-Term Restarting GPR Algorithm 85 5.3 Concluding Remarks 90

CHAPTER 6 : EXTENSION OF THE GPR ALGORITHM 91

6.1 Theoretical Basis 91

6.2 Algorithm Construction 93

6.3 Implementation and Basic Algorithm 97

6.4 Concluding Remarks 100

CHAPTER 7 : COMPUTATIONAL EXPERIMENTS 101

7.1 Line Search Algorithm 101

7.2 Test Problems 105

7.3 Numerical Results 118

7.4 Discussion 6f the Results 128


CHAPTER 8: OPTlMIZED SOFTWARE FOR GENERAL

PURPOSE USE 131

8.1 Subroutine Structure 131

8.2 User Interface of the GPR Routine 138

8.3 User-specified Optional Parameters 142

8.4 Precision of the Calculation 150

8.5 Error Indicators 150

8.6 Accuracy of the Solution 150

8.7 Efficiency and Reliability 151

8.8 A Numerical Example 151


v

Table of Content3

CHAPTER 9 : OTHER APPLICATIONS OF THE GPR ROUTINE 157

9.1 Problems and Computational Performance 157


CHAPTER 10 : FINAL CONCLUSIONS

10.1 Summary and Comments

10.2 Suggestions for Further Research

APPENDIX A : SUMMARY OF FAILURES

APPENDIX B : QUICK GUIDANCE

APPENDIX C : COMPLETE RESULTS

APPENDIX D : PROGRAM LISTINGS

APPENDIX E : OTHER DEPENDENT SUBROUTINES

REFERENCES

VI

164

164

166

167

170

174

282

351

358

ACKNOWLEDGEMENTS

I am greatly indebted to my supervisor Professor C. Storey of Loughborough

University of Technology for his guidance and help throughout this work. I

would like to acknowledge especially his enormous effort in correcting my written

l;:nglish. I would also like to thank him for introducing me to the interesting field

of Optimization. I take this opportunity to express my gratitude to him for all

the advice and encouragement I have received from him. I also acknowledge the

productive interactions that I had with Professor Evans of Computer Studies. I

wish to thank Louise and Helen for helping me with the typesetting.

I am very grateful to my director of research Dr. A. C. Pugh for the enormous

support he gave me throughout my candidature. I sincerely acknowledge all the

assistance I obtained from Mr. R. Tallet and Dr. M.A. Rahin.

I express my deep gratitude to my parents for their patience, to my brothers

for their understanding and to my in-laws for their rendering valuable support. I

specially record my dept of gratefulness to my father Professor A.F.M. Khodadad

Khan, whose constant advice and encouragement has always been a source of

inspiration for me. My gratitude is also due to my wife Ellora for her constant

inspiration and mental support to achieve my goal.

I would also like to express my gratitude to the Department of Mathematical

Sciences of Loughborough U Illversity of Technology for supporting me throughout

my candidature. I sincerely thank the Pilkington Library and the Computer

Certre of Loughborough University of Technology for generously letting me use

their facilities. Finally, I would like to gratefully acknowledge the Commonwealth

Scholarship Conunission and the British Council for awarding me a scholarship,

during the tenure of which, this research was carried out.

vu

SUMMARY OF THE THESIS

TITLE

A Theoretical and Computational Investigation of a Generalized Polak-

Ribiere Algorithm for Unconstrained Optimization.

ABSTRACT

In this thesis, a new conjugate gradient type method for unconstrained

minimization is proposed and its theoretical and computational properties investi-

gated. This generalized Polak-Ribiere method is based on the study of the effects

of inexact line searches on conjugate gradient methods. It uses search directions

which are parallel to the Newton direction of the restriction of the objective

function on a two dimensional subspace spanned by the current gradient and a

suitably chosen direction in the span of the previous search direction and the

current gradient. It is shown that the GPR method (as it is called) has excellent

convergence properties under very simple conditions. An algorithm for the new

method is formulated and various implementations of this algorithm are tested.

The results show that the GPR algorithm is very efficient in terms of number

of iterations as well as computational labour and has modest computer storage

requirements.

The thesis also explores extensions of the GPR algorithm by considering

multi-term restarting procedures. Further generalization of the GPR method

based on (m + 1)-dimensional Newton methods is also studied.

Optimized software for the implementation of the GPR algorithm is de-

veloped for general purpose use. By considering standard test problems, the

V11l

Summary of the TheJiJ

superiority of the proposed software over some readily available library software

and over the straight-forward Polak-Ribiere algorithm is shown. Software and

user interfaces together with a simple numerical example and some more practical

examples are described for the guidance of the user.

IX

CHAPTER 1 INTRODUCTION

This Thesis is an attempt to add to the theory of nonlinear optimization

which, of late, has emerged as a useful branch of applied mathematics. In the introductory chapter, we discuss briefly the nature of optimization with special

emphasis on the solution of unconstrained problems and give an outline of our

work.

1.1 General Nature of Optimization

Optimization is concerned with getting the best from a gIven situation

by analysing a set of alternative decisions. This is achieved by selecting a

performance index for the situation under assessment, expressing it in terms of

certain decision variables and then obtaining its best possible value by systematic

adjustment of the variables. The choice of the performance index differs from

situation to situation but generally involves some economic considerations, e.g.,

maximum return on investment, minimum cost per unit yield, etc.. It may

also involve some technical considerations such as minimum time of production,

maximum efficiency of machines and so on.

Optimization problems arise in a variety of practical situations. The way

In which the performance index is obtained from the variables of a problem

also varies widely from one situation to another. In some cases, it can only be qualitatively described, whereas mathematical models of many other problems can

be formulated in which the performance indices are described by some suitably

defined objective functions. In the latter case, the problem then reduces to a mathematical programming problem for finding the minimum or maximum value

of the objective function.

Chapter 1: Introduction 2

Mathematical modeling of optimization in many real-life situations leads

to constrained problems in which the variables are restricted in some way -

sometimes by having simple upper and lower bounds and sometimes by complex

functional constraints. In fact, many complex problems such as, for instance, the

production policy of a big company and the management of a large network are

best treated by decomposing them into separate subproblems - each subproblem

having constraints which are imposed to restrict its scope. On the other hand,

many constrained problems can be converted to unconstrained ones in which the

variables are free to assume all possible values, either by broadening the scope

of the problem or by eliminating some variables using the constraints. Moreover,

the unconstrained problems represent a significant class of practical problems.

Optimization problems have attracted the attention of researchers for a long

time. The earlier problems investigated were geometrical in nature. Later on,

with the development of calculus, a formal theory of optimization grew up. This

classical theory, though rich in theoretical content, is not of much practical value

in numerical computation, especially in dealing with large-scale problems.

Since the advent of electronic computers in the nineteen forties, there has

been a rapid development of theory and practice of optimization. There is now a

massive literature on the subject and vigorous research is still in progress creating

new theory and testing various algorithms. Recent advances in the power and

storage capacities of digital computers have made it possible to deal with large-

scale optimization problems efficiently.

1.2 Unconstrained Optimization

A static unconstrained optimization problem is concerned with finding a local

minimum or maximum of a prescribed real-valued function f : Rn -+ R of n real variables without any constraint on the variables. Without loss of generality, one

may restrict consideration to minimization problems only, because maximization

can be dealt with by minimization of - f(x,,"', x n ).

Numerous methods have been devised for solving general minimization

problems, the choice and suitability of any particular method being dependent

on the nature and size of the problem. These methods are, in general, iterative

in nature and give procedures for obtaining a sequence of approximate solutions


converging to the actual solution. In practice, such methods start at an initial

estimate of the minimizer and then proceed, according to some fixed rule, to

better and better approximations, terminating at the actual minimizer or at an

acceptable (according to pre-set standards) approximation of the minimizer after

a finite number of iterations. For surveys of some of these techniques, we refer to

Dennis and Schnabel[Dl]' Gill, Murray and Wright[Gl], Wolfe[Wl), Walsh[W4]'

Zoutendijk[Zl).

There are some methods in which the generation of the minimizing sequence

is based simply on comparison of values of the objective function and no use of

derivatives is made. These so-called direct search methods were once thought to be

useful in dealing with problems in which the objective function is not differentiable

or its partial derivatives are hard to evaluate. They are, however, very crude and

generally prove to be less efficient than methods making use of derivative values

no matter how these have to be evaluated.

Problems involving smooth objective functions are best dealt with by

gradient methods. In such methods the minimizing sequence is generated by

determining at each step a direction of search and then locating the best possible

estimate of the minimum point in the line of that direction through an appropriate

choice of the steplength. The search direction at each step, constructed using the

gradient values and sometimes the Hessian values also, is required to be such

that function values initially decrease in that direction. The primary differences

between various gradient methods rest with the way in which the successive

search directions are constructed. Once this is done, all such algorithms call for

choosing the minimum point on the corresponding line (exact line search), though,

in practice, one is satisfied if the steplength satifies some accepted minimizing

criterion (inexact line search).

The development of efficient algorithms for solving unconstrained optimiza-

tion problems is still an important area of research. This importance is derived

not only from the desire to solve unconstrained problems, but also from the use

made of these algorithms in constrained optimization. Indeed, unconstrained

optimization lies at the heart of the whole of nonlinear optimization.

In the next chapter, we shall give a short account of some gradient methods

of unconstrained minimization as an introduction to our work.


1.3 Scope and Organization of The Thesis

In this thesis, we are concerned with the static unconstrained optimization

problem

P : Minimize f(x), x ERn,

where the objective function f : Rn - R is, in general, a nonlinear function and is at least twice continuously differentiable. Our study begins with a short review of

some basic results and solution techniques in Chapter 2. Then in Chapter 3, we

develop a new conjugate-gradient type algorithm which is a generalization of the

Polak-Ribiere algorithm and discuss its theoretical and algorithmic properties.

This algorithm, referred to as the Generalized Polak-Ribiere (GPR, in short)

Algorithm in the sequel, is extended and further examined in Chapter 4 and

Chapter 5. An (m + I)-dimensional version of the GPR Algorithm is considered in Chapter 6 and various computational results are discussed in Chapter 7. The

efficiency of the Algorithm and optimized software for its implementation (called

the GPR Routine) are investigated in Chapter 8. The GPR Routine is applied to

some practical problems in Chapter 9 and final conclusions are made in Chapter

10.

CHAPTER 2 MATHEMATICAL FOUNDATIONS

In this Chapter, we set out the notation to be used throughout the Thesis,

discuss some basic results and give short accounts of some solution techniques.

2.1 Notation

In this study, the Euclidean n-space will be denoted by Rn with Ri = R, the

real line. The points x in Rn will be considered as column vectors:

(2.1.1 )

the corresponding row vector being

xT = (x,,, x n ) = (x,):: (2.1.2)

The subscript i, always ranging from 1 to n (unless otherwise specified), will be

reserved to indicate vector components, whereas, the superscript (k) will be used to distinguish vectors as X(i), x(2), .... We shall write xT z and IIxll to indicate

the Euclidean inner product and norm respectively:

(2.1.3)

(2.1.4)

Chapter ~: Mathematical Foundation.5 6

B(x, e) will denote the e-ball about x in Rn:

B(x,e) = {z E Rn: IIz -xII < e}. (2.1.5)

The elements of a matrix will be indicated by double subscripts, the first

index indicating the row and the second index the column. For an n X n matrix

A, IIAII will denote the induced Euclidean norm.

Our notation for the objective function will always be f() in the general

case and q(.) in the quadratic case. The gradient vector and the Hessian matrix

of the objective function will be denoted by g(.) and GO respectively. Thus, in the general case with f : Rn -+ R,

(2.1.6)

In an iterative process for finding the minimum of f( x), we shall denote the starting point by x(J) and the subsequent iterates by X(2), x(3), etc., and write

(2.1.7)

The search direction at the k-th step will be denoted by s(k) and the steplength

in this direction by a(k), so that .

(2.1.8)

er will denote the class of r-times continuously differentiable functions f: Rn -+ R.

(.) (2) (.) (2) ( V 11 V ) WIll denote the angle between the two vectors v and V

kEn. ,n2 will be used to mean that the integral variable k may assume

values n. through n 2

Chapter J!: Mathematical Foundations 7

(1) (2) . (1) (2). (1) (2). For x ,x E Rn wIth x # x ,the line-segment from x to x will be

(1) (2) (1) (2) denoted by [x ,x ] when end pomts are mcluded and by (x ,x ) when end

points are excluded.

~ (as above) will be used to indicate a definition and will mean the end

of a proof.

For convenience of reference, we shall number some statements (equations).

This will be done serially in a section, and will be referred to (a.b.c), where a is

the chapter number, b is the section number and c is the statement number. The

introductory portion of a chapter is numbered section O.

The lemmas, propositions and theorems will be numbered serially in a

chapter as a.b, where a is the chapter number and b = 1,2, etc.

The tables and figures will also be numbered serially in a section as (a.b.c),

where a and b are the chapter and section number respectively and c = 1,2, etc.

2.2 Background Material

We shall freely use various notions and results from analysis, linear algebra

and optimization theory in our work. All the relevant material used can be

found in standard texts in analysis, linear algebra and optimization (for example,

the text by Dennis and Schnabel[D1] has introductory sections dealing with this

background material).

2.3 Gradient Methods of Minimization

As remarked in Section 1.2, a gradient method for minimizing a smooth

nonlinear function I( x) under no constraints calls for generating a search direction s(k) at each iteration and a steplength (/k) in that direction so as to determine

the next point

(2.3.1 )

satisfying the descent criterion

(2.3.2)

Chapter 2: Mathematical Foundation" 8

The process stops at x(m) if g(m) = 0 or yields a sequence {x(k)} of points

converging to an approximation to a local minimum x which satisfies some

convergence criterion.

One of the oldest methods is the method of steepest descent, first introduced

by Cauchy[Cl]. In this method the directions of search are taken as

(2.3.3)

This choice is motivated by the fact that, local to the current approximation, the

negative gradient direction is the direction along which the function decreases

most rapidly. The steepest descent algorithm, though simple and stable (that

is, reduces the function value at each step), has the disadvantage of linear

convergence which may, at times, be extremely slow, and so it is not suitable

for practical use.

Another basic minimization technique is the Newton method, based on the

classical Newton method for solving nonlinear equations (Fletcher[Fl], Dennis

and Schnabel[Dl]' Gill, Murray and Wright[Gl]). In this method, the directions

of search are calculated from

(2.3.4)

or equivalently from the linear system

(2.3.5)

The idea behind this method is that a function may be locally approximated

by a quadratic whose minimum can be reached in one step by the above choice

of direction. The Newton algorithm for a general function is not necessarily

convergent, but for C' functions with positive definite Hessian at x, convergence is quadratic under mild restrictions on f (Fletcher[Fl]' WoJfe[Wl]) if x(k) is near enough to x for some k. This rapid convergence property makes the method

extremely efficient in many cases. However, the method has the disadvantages

that it involves a large amount of computation at each step, in the way of

calculating and inverting the Hessian, or solving a system of linear equations

and it requires quite a large amount of storage in its implementation.

Chapter : Mathematical Foundation3 9

In a bid to eliminate some of the computational disadvantages of the Newton

method, the so-called quasi-Newton (abbreviated as QN) methods have been

developed. These methods, first introduced by Davidon[D4] and later clarified by Fletcher and Powell[F4], have the general feature that the search directions are

given by (2.3.6)

where H(k) is an approximation to G(k)-' (or G(k) itself) with H(l) symmetric,

positive-definite (usually, H(I) = In, the n x n identity matrix) and the so-called

quasi-Newton conditions H(k+I)y(k) = d(k)

hold. Besides the Davidon-Fletcher-Powell (DFP) updating formula

H(k)y(k)y(k)TH(k)

y(k)TH(k)y(k)

(2.3.7)

(2.3.8)

there are now a variety of QN procedures differing in the ways in which the

matrices H(k) are updated (Fletcher[F1], Dennis and Schnabel[D1]' Dennis and More[D5]). A well-known group of updating matrices is Broyden's 0-

family(Broyden[B4]):

where

(k+I) _ (k) _ u(k)u(k)T d(k)d(k)T

H - H v(k) + '1(k)

+ O(k) (u(k) _ (~:::) d(k) (u(k) - (~:::) P) ~ (2.3.9a)

'1(k) = d(k)T y(kl,

u(k) = H(k)y(k),

v(k) = u(k)T y(k)

(2.3.9b)

and O(k) is a free parameter. The DFP formula is a. particular member of this

class (O(k) = 0). Another particular member (O(k) = -;t.r) is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula (Broyden[B4]' Fletcher[F5]' Gold-

farb[G5], Shanno[S7]):

(2.3.10)

Chapter 2: Mathematical Foundation" 10

which is still considered to be the most effective of the QN methods (Shanno and

Phua[S5]).

The QN methods have a serious disadvantage, in the case of large scale

problems, and that is the need to store matrices in their implementation. At-

tempts to avoid this difficulty have stimulated research in the area of conjugate

gradient (abbreviated as CG) methods which call only for vectors in their

implementation. Originally proposed by Hestenes and Stiefel[Hl] to solve systems

of linear equations, the CG method was first applied to minimization problems by

Fletcher and Reeves[F2]. The underlying idea is that the minimum of a quadratic

function

(2.3.11)

where A is symmetric and positive-definite, is obtained in at most n steps through

exact line search along each of n mutually A -conjugate directions. In this case,

the CG search directions are chosen as

satisfying the de3cent condition

for k = 1, for k> 1

at each step, with the (3(k) chosen so that the conjugacy condition3

S(i)TAs(j) = 0, . J. E 1 n ; -i. J. t, ". T

are satisfied.

(2.3.12)

(2.3.13)

(2.3.14)

Several formulae for (3(k) have been obtained. Of these, the FR formula

(Fletcher and Reeves[F2])

(2.3.15)

the PR formula (Polak and Ribiere[Pl])

(2.3.16)

............. -----------------------

Chapter : Mathematical Foundatiom

and the HS formula (Hestenes and Stiefel[Hl], Sorenson[Sl])

(k) _ g(k)T(g(k) _ g(k-I)

fJHS - s(k-J)T(gm _ g(k I))

11

(2.3.17)

are often used. These different formulae for fJ are completely equivalent on quadratics when exact line searches are used. They can also be used on general

nonlinear functions J(.), but then their computational behaviour and efficiency differs considerably from one formula to another.

Theoretical and computational properties of different CG methods have been

investigated by many authors (Beale[Bl], Crowder and Wolfe[C3], Powell[P3],

Baptist and Stoer[B6], Stoer[S6], Cohen[C2], Fletcher[F3], Shanno[S2], Hu and

Storey[H3], Wolfe[W2,W3]). Though the FR method has nice global convergence

under very mild conditions (Zoutendijk[Z2], Powell[P5], Al-Baali[Al]), no such

satisfactory global convergence results are available for the PR and HS methods

(Gilbert and Nocedal[G3]). It has also been observed (Powell [P4]) that the PR

method is unlikely to have global convergence without some restrictive conditions.

On the other hand, the numerical performance of the PR method has been found

to be superior to that of the FR method in most cases. Recently, some quite

efficient hybrid CG methods have been proposed (Touati-Ahmed and Storey[Tl],

Gilbert and Nocedal[G3]). Attempts to improve upon the performance of the

CG methods have also led to some generalizations. These include Beale's and

Nazareth's three term recurrence methods (Beale[Bl]' Dixon, Ducksbury and

Singh[D2], Nazareth[N4,N3]' Dixon[D3]) and the generalized CG method of Liu

and Storey[L2]. This latter method (abbreviated as the LS method) is in fact

a two-dimensional Newton method in the sense that it uses as the next search

direction s(k) the Newton direction of the restriction of f on span{g(k), s(k-I)}, with g(k) the current gradient and s(k-I) the previous search direction. Thus the

LS algorithm uses the search direction

(2.3.18a)

with

g(k)T C(k)s(k-I) )-' ( g(k)Tg(k) )

s~k-I)T G(k) s(k-I) g(k)TS(k-I)

(2.3.18b)

Chapter 2: Mathematical Foundation.! 12

Both the QN and the CG methods have their advantages and disadvantages.

There have been several attempts to combine the two methods so as to obtain

algorithms with the good convergence properties of the QN methods and low stor-

age requirements of the CG methods. Work along these lines include Perry[P6],

Shanno[S3,S4], Buckley[B5], Shanno and Phua[S8], Nazareth[Nl], Nocedal[N2],

Buckley and LeNir[B2,B3]' Liu and Nocedal[Ll] and Gill and Murray[G4]. As a

result, some variable storage CG methods or limited memory QN methods have

been developed having good trade-off between memory and efficiency.

2.4 Line Search Strategies

Any descent method of function minimization involves a one-dimensional

line search at each iteration for locating the next acceptable approximation to

the minimizer. Thus, at the current point X(k), if g(k) ccF 0, we choose a descent

direction s(k) satisfying (2.3.13) and then determine an admissible steplength

a(k) > such that the descent criterion (2.3.2) is satisfied at the next point x(k+I) defined by (2.3.1). The descent condition (2.3.13) ensures that for all sufficiently

small a > 0, f(x(k) + as(k)) < f(x(k)), and hence one can always choose a(k) > such that (2.3.2) holds. In practice any a E (0, a~k)), where

(2.4.1)

is accepted as a(k), subject to certain conditions to ensure a sufficient decrease

f(k) - f(k+l) in the function value. Notice that a(k) is the least positive number

for which f(x(k)+a~k)s(k)) = f(x(k)) if such a number exists; otherwise a~k) = 00:

In exact line search at x(k), the steplength a(k) is taken to be the value of a

that minimizes the function

(2.4.2)

in (0, a~k)), provided such a minimizer exists. Thus, according to exact line search,

(2.4.3)

Assuming the existence of stationary points of .p(k)(.) in (0, a~k)), we then have

the exact line search condition

(2.4.4)

Chapter ~: Mathematical FoundationJ 13

The determination of a(k) by exact line search involves the minimization of

the nonlinear function tj>(k), or solving the nonlinear equation tj>(k)' (a) = 0, which

is usually, expensive to carry out. Moreover, tj>(k) may not have a minimizer

or a stationary point in (0,00). Therefore, exact line search has only theoretical importance and in practice, alternative inexact line search strategies are preferred.

Indeed, many efficient line search techniques have been proposed and tested.

These are, in fact, based on a "one dimensional" minimization using a combination

of interval reduction and quadratic or cubic interpolation techniques depending

on the availability of gradient information. For a discussion of such inexact line

search methods, we refer to Fletcher[F1], Dennis and Schnabel[D1]' Gill, Murray

and Wright[G1], Wolfe[W1].

In choosing a steplength a(k) at a current point x(k), we need to stay away

from the end points of the interval (0, a~k) in order to produce a significant

decrease in the function value. The Goldstein requirement (Goldstein[G6])

(2.4.5)

with 0 < C, < ~ ensures that a(k) is not too close to a~k) by restricting the average rate of decrease of I(x) in moving from x(k) to x(k+l) along s(k) to be at

least some prescribed fraction of the initial rate of decrease in that direction (see

Figure 2.4.1 below). On the other hand, the Wolfe condition (Wolfe[W2,W3])

(2.4.6)

with 0 < c, < 1 ensures that a(k) is not too small by requiring the rate of decrease 'of I at x(k+l) in the direction s(k) to be larger than some prescribed fraction of the

initial rate of decrease (see Figure 2.4.1 below). The restriction 0 < c, < c, < 1 guarantees that (2.4.5) and (2.4.6) can be satisfied by om E (0, a~k) (Wolfe[W2], Powell[P8] ).

In recent studies, the strong Wolfe condition

(2.4.7)

together with the Goldstein condition (2.4.5) subject to 0 < c, < c, < 1 are often preferred as line search requirements (Fletcher[Fl], AI-Baali[A 1], AI-

Baali and Fletcher[A2]' Liu and Storey[L2]). We call the combination of these

Chapter 2: Mathematical Foundation.! 14

two conditions the Wolfe-Powell Condition.!. Conditions (2.4.5) and (2.4.7) are

sometimes referred to as strong Wolfe conditions (Gilbert and Nocedal[G3]).

L-~--------------~----~r-----~a I-< Permissible under (2.4.5) >1 a *

I-< Permissible under (2.4.6) >l ~ Permissible under both >1

Figure 2.4.1. Permissible range for (l(k) under conditions (2.4.5) and (2.4.6).

It is remarked that the value of c, determines the accuracy with which (/k)

approximates a stationary point of f along s(k), and consequently provides a means of controlling the balance of effort to be expended in computing a(k). In

general, the smaller the value of c" the more accurate the line search is. Obviously,

if c, = 0, the line search is exact.

CHAPTER 3 GENERALIZED POLAK-RIBIERE ALGORITHM

In this Chapter, we develop a new type of conjugate gradient algorithm for

finding a local solution to the problem,

[P] Minimize I(x), x E Rn

and discuss various theoretical properties of the algorithm. The search directions

in the algorithm, as we shall see, are generalizations of those in the Polak-Ribiere

method, and so the algorithm is called the Generalized Polak-Ribiere Algorithm

(GPR Algorithm, in short).

In [P], the objective function I : Rn -+ R is, in general, nonlinear and it is assumed throughout the sequel (whether stated explicitly or not) that the

folJowing conditions hold:

[AP-l] I is twice continuously differentiable. [AP-2] 3 X(l) E Rn 3 the level set

is bounded.

Additional conditions will be added whenever necessary.

It may be observed that

(i) By [AP-l], the Hessian G(x) is symmetric for all x ERn.

(3.0.1)

(ii) By [AP-l],the level set L(x(!) in [AP-2] is closed and hence it is

compact.

Chapter 9: Generalized Polak-Ribiere Algorithm 16

(iii) The objective function f('), the gradient g(.) and the Hessian G(), being continuous, are bounded on the compact set L(x(l with x(l) as in [AP-2J.

Defining

M b. sup{IIG(x)1I : x E L(X(I)}, (3.0.2)

we have then

(3.0.3)

3.1 Derivation of the Algorithm

We begin with an estimate x(l) of a local minimizer x of f and take the initial search direction as the steepest descent direction at x(l):

(3.1.1)

To determine the search direction s(k) for the k-th iteration (k > 1) from the current point x(k), we proceed as follows:

Let F(x + as) denote the quadratic approximation to f(x + as), obtained by truncating the Taylor series expansion of f( x + as): .

Assuming that G( x) is positive-definite, we can write

where 9 b. g(x) and G b. G(x), and hence

1 (gTs)2 min(F(x + as) - f(x = --2 T.G

" s s

1 =--V,

2

where V = V(x,s) is given by

(3.1.2)

(3.1.3)

(3.1.4)


and the minimum occurs for

(3.1.5)

We now set

s = -g + {3p, (3.1.6) where p is an arbitrary but fixed vector in Rn such that p and 9 are linearly independent and {3 is a nonzero real variable, and minimize (3.1.3) as a function

of {3. This demands that we choose {3 such that

v _ (gT(_g+{3p))2 ({3) - (_g + {3p)TG( -g + {3p)

(gTg _ fJgTp)2 (3.1.7)

is maximillll. Here the denominator is positive for all {3 in view of positive-

definiteness of G.

The value of {3 for which (3.1.7) is maximum must satisfy the equation

(3.1.8)

obtained by setting d~


provided the denominator in fJ2 is nonzero. The search direction

corresponding to (3.1.10) is not a descent direction as gT 81 = 0 and is of no importance to us (in fact fh makes V take its minimum value 0). The search direction

82 = -g + fJ2P corresponding to (3.1.11) forms the basis of the proposed algorithm. The L5

algorithm, studied in Liu and 5torey[L2} and Hu and 5torey[H2), is also based on 82. Notice that (3.1.11) reduces to (3.1.9) for gTp = O.

We now let

(3.1.12)

in span{s-,g}, where s- is the search direction in the previous iteration, and

'Y f' 0 is determined so that T

9 P = o. (3.1.13)

This requires

(3.1.14)

The current search direction is then defined by (3.1.6) withp described by (3.1.12)

and (3.1.14) and (3 given by (3.1.9).

IT we denote p by s, we then have the following iterative process for the GPR algorithm from the initial estimate x(l) for the minimizer x :

( k) { _g(l) for k = 1 s -' , - _g(k) + (3(k) s(k-I) for k > 1

GPR' ,

-(k-I) _ (k-I) _ (g(k)TS(k-I) (k) s - S (k)T (k) 9 ,

9 9 for k > 1,

(3.1.15a)

(3.1.15b)

(3.1.15e)

(3.1.15d)


It may be remarked that the stopping condition will be activated whenever

g(k) = 0 at any iteration and so we can assume that g(k) oF 0 as long as the iteration continues. Moreover, it follows from (3.1.15c) that

(3.1.16)

and hence, from (3.1.15b), we have

(3.1.17)

as long as g(k) oF O. This shows that:

Proposition 3.1. In the GPR Algorithm, ik) is a descent direction from x(k).

The steplength a(k) at each iteration is determined by a one-dimensional line

search (see section 2.4) along s(k) so that

For an exact line search

a(k) = arg min f(x(k) + as(k), o


at x(l) satisfying [AP-2). Besides the general problem [P), the quadratic case,

namely,

[Q) Minimize q(x), x ERn,

where (3.2.1)

will alilO be considered. In dealing with [Q], it will be assumed throughout the

sequel that

[AQ) The Hessian A is symmetric and positive-definite.

It may be noted that for the quadratic function q(.),

and hence

In this case, the steplength

g(x) = Ax + b, G(x) = A

where y(k) is as in (2.1.7). In case of exact line search, (3.2.4) becomes

(k) _ g(k)T g(k)

a - s(k)T As(k)

(3.2.2)

(3.2.3)

(3.2.4)

(3.2.5)

by the exact line search condition (3.1.20) and the descent condition (3.1.17).

In what follows, unless explicitly referred to [Q], we shall consider that the

GPR algorithm is applied to [Plo

The GPR algorithm has the property that s(k) is conjugate to s(k-I) for all

k > 1. Indeed, we have,

Proposition 3.2. The GPR algorithm satisnes

(3.2.6)

for all k > 1.


Proof. This follows directly from (3.1.15b) and (3.1.15d). I

By applying the mean value theorem to g(.), we obtain, according to the GPR algorithm,

where

y(k) = g(k+1 ) _ g. l~(x(k) + taCk) P)dt = G(~(k)

for some ~(k) E (x(k), x(k+l).

(3.2.7)

(3.2.8)

(3.2.9)

Now, if IId(k)1I = IIX(k+I) - x(k)1I is sufficiently small, then since G() IS continuous, we can approximate e(k) by G(k), and thus obtain

So, in this case, if exact line searches are carried out (in which case S

Chapter 9: Generalized PolakRibiere Algorithm 22

Proposition 3.3. lithe GPR algorithm is applied to IQ] and exact line searches are used throughout, then

for all k > 1 and j E 1, k-1.

S(k)T As(j) = 0,

g(k)T g(j) = 0

(3.2.11)

(3.2.12)

Proposition 3.4. lithe GPR algorithm is applied to IQ] and an exact line search is carried out at each iteration, then

(3.2.13)

for k > 1 and j E 1, k - 1.

Proposition 3.5. lithe GPR algorithm is applied to IQ] with exact line searches, then the algorithm terminates at a stationruy point x(m+I) after m ~ n iterations,

where m is the number of distinct eigenvalues of A.

Notice, however, that convergence is not, in general, obtained in a finite

number of steps if the objective function is not quadratic, and the number of

iterations required to attain a given accuracy depends upon the initial estimate x(l) of the minimizer x.

We now consider some relations between the magnitudes of different quanti

ties occuring in the GPR algorithm applied to the general problem IP]' which we shall use in the subsequent analysis.

Proposition 3.6. In the GPR algorithm,

(a) IIs(k) 112 = IIg(k) 112 + (.B~~Slls(k-l)1I2, k > 1 (b) IIg(k)1I ~ IIP)II, k ~ 1 (c) IIP)II~lIs(k)lI, k~I

Proof. From (3.1.I5b), we get, Vk > 1,

liP) 112 = (- g(k) + .B~~R s(k-l)f (_ g(k) + .B~~R P-l)

= Ilg(k) 112 + (.B~~R)2I1s(k-l) 112,

(3.2.I4a)

(3.2.I4b)

(3.2.I4c)

Chapter 9: Generalized Polak-Ribiere Algorithm

since g(k)T S


Proposition 3.8. For all k > 1, (k) M IIg(k) 11

l.BoPRI ~ m IIs(k 1)11'

Proof. From (3.1.15d), (3.0.2) and (3.2.18), we have, V k > 1, (k) IIg(k) IIIIG(k) 1111:5(.1;-1) 11

I.BOPRI ~ m IIs(k-1) 112

M IIg(.I;) 11 ~ m IIs(k 1) 11 I

Proposition 3.9. For all k > 1,

IIP)II ~ (1+ ~)lIg(k)lI. Proof. From (3.1.15b), we get, V k > 1,

liP) 11 ~ Ilg(k) 11 + l.Bi~R 11I:5(k-1) 11

~ IIg(k) 11 + M IIg(k) 11 m

using (3.2.19). I

Proposition 3.10. There exists r > 0 such tbat

cos (}(k) ~ r

for all k, where (}(k) 6 (_ g(k)A s(k).

24

(3.2.19)

(3.2.20)

(3.2.21)

-1

Proof. This follows from (3.2.15) and (3.2.20) with r = (1 + ~) > O. I

When the GPR algorithm is implemented with exact line search at each

step, we have, from (3.2.7) using the exact line search condition (3.1.20) an


Proposition 3.11. For all k,

1 < Q(k) < ..!... M(1+ ~/ - - m

Proof. From (3.2.9) and (3.2.18), it follows that

m IIs(k) 112 ~ s(k)T a(k) s(k) ~ M IIs(k) 112

for all k and hence, by (3.2.22),

IIg(k) 112 M IIs(k)1I2

Since, by (3.2.14b) and (3.2.20),

1 IIg(k) 11 (1 + ~) ~ IIs(k)1I ~ 1,

we have (3.2.23a).

Proposition 3.12. For all k,

IIg(Hl) 11 ~ (1 + ~) Ils(k) 11. Proof. From (3.2.7) and (3.2.9), we have,

g(k+l) = g(k) + Q(k)G(OP)

25

(3.2.23a)

(3.2.23b)

(3.2.23c)

(3.2.24)

for some ~ E (x(k), x(k+ 1) C L(x(!). Hence using (3.2.14b), (3.2.23a) and

(3.0.2),we conclude that

IIg(k+l)II ~ (1 + ~)lIs(k)lI . Proposition 3.13. For all k > 1,

IP(k) I ~ (1 + M) M. GPR m m

Proof. In view of exact line search, we have,

s(k-l) = s(k-l)

(3.2.25)

for all k > 1 and hence (3.2.25) is obtained from (3.2.19) using (3.2.24).

Proposition 3.14. For all k > 1,

Ils(k)1I ~ (1+ ~r"P-l)l. (3.2.26) Proof. This follows from (3.2.20) and (3.2.24).


3.3 Global Convergence Properties of the Algorithm

In this section, we discuss global convergence properties of the GPR algo-

rithm applied to [P) under standard line search strategies (as discussed in section 2.4). Throughout the section, it is assumed that the conditions [AP-l) and [AP-2)

hold and the GPR algorithm is initiated at X(I) satisfying [AP-2).

We first observe that in view of Proposition 3.1, the GPR algorithm with

exact line search satisfying conditions (3.1.19) and (3.1.20) or with inexact line

search satisfying Wolfe-Powell conditions

[W-l) !(HI)::; !(k) + Cl a(k)g(k)Tp),

[W-2) Ig(k+I)TS(k)l::; - c2

g(k)TS(k),

where 0 < Cl < c2 < 1, leads to the inequality

(3.3.1)

(3.3.2)

(3.3.3)

with some p > O. This is established in Proposition 3.15 and Proposition 3.16. The proofs depend on the descent property (3.1.17) and are valid for any descent

algorithm.

Proposition 3.15. If an exact line search is performed at each iteration with

the GPR algorithm, then the inequality (3.3.3) with some p > 0 holds for all k.

Proof. By the exact line search condition (3.1.19), we have,

(3.3.4a)

where a~k) is as defined in (2.4.1).

But, for 0 < a < a(k), we have, by the Taylor formula,

!( x(k) + as(k) = !(k) + ag(k)Ts(k) + !a2 s(k)T G(x(k) + /laP)s(k)

for some /I E (0,1). Since the segment [x(k), x(k) + as(k) C L(x(l)), 50, using (3.0.3), we have,

(3.3.4b)


for 0 < a < a(k). The quadratic polynomial

p(k)(a) ~ I(k) + ag(k)Tp) + ~a2 M IIs(k) 112

attains its minimum value

at

(k) (k) (g(k)TS(k2

Pm;. = I - 2M IIs(k) 112

-(k) _ g(k)T s(k)

a - - M IIs(k) 112

27

(3.3.4c)

which is positive by (3.1.17). Since p(k)( a) is decreasing on (0, o(k and increasing on (a(k) 00) it follows that a(k) < a(k) and hence , , .

( (k)T (k2

J


We next show that acceptable steplengths exist in line searches using the

Wolfe-Powell conditions [W-l] and [W-2] for c" c, satisfying 0 < c, < ~ and C, < c, < 1. The proofs, though standard for any descent algorithm, are included for the sake of completeness.

Lemma 3.17a. For any c, E (0, ~), steplengths ark) > 0 can be determined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-l].

Proof. For a > 0 such that [x(k), x(k) + as(k)] C L(x(l), we have, as in the case of (3.3.4b),

Notice that for any c, E (O,~) and any a > 0, (3.1.17) implies

f(k) + ag(k)Ts(k) + ~a2 M liP) 112 :::; f(k) + c, ag 2(1 - c, )lIg(k) 112 M Ils(k) 112

(3.3.6a)

(3.3.6b)

Since f(x(k) + as(k) initially decreases along s(k), so either there exists a least positive a~k) such that

or else,

for all a > O.

In the first case, we notice, from (3.3.6a), that

(3.3.6c)

which is greater than a:(k) for 0 < c, < ~. So, in either case, any positive a(k) :::; a:(k) will satisfy (3.3.1). I

Lemma 3.17b. For any c, E (0,1), steplengtbs a(k) > 0 can be detennined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-2J.


Proof. The proof is by contradiction.

Suppose that for some c2 E (0,1) and all a > 0,

Ig(x(k) + as(kTs(k)1 ;, - c2

g(k)TS(k), (3.3.7a)

that is,

(3.3. 7b)

or

(3.3.7c)

We notice that the function

has derivative

(3.3.7d)

IT (3.3.7b) holds, then (3.3.7d), (3.3.7b) and (3.1.17) imply that 4>'(t) > 0 for all t ~ 0 and hence 4>(0) < 4>(a), that is, J(x(k < J(x(k) + as(k for all a > O. This contradicts the fact that s(k) is a descent direction. So (3.3. 7b) cannot hold.

IT (3.3.7c) holds, then it follows from (3.3.7d), (3.3.7c) and (3.1.17) that

4>(a) - 4>(0) = l4>'(t)dt < -c2 a llg(k)1I 2 ,

that is,

(3.3.7e)

for all a > O. It then follows that x(k) + as(k) E L(x(1 for all a > O. But the continuous function J is bounded on the compact set L(x(1. Hence, 3N > 0:;1 "la > 0, IJ(x(k) + as(k1 ~ N. However,

N + J(k) (k) (k) a > c

2I1g(k)1I 2 ~ J(x + as ) < - N.

This contradiction therefore shows that (3.3.7c) cannot hold, so the Proposition

is proved. I

Proposition 3.18. For any C"C2 satisfying 0 < Cl < ~ and Cl < C2 < 1, there exists an interval of acceptable steplengths a(k) > 0 in a line search at x(k) with the GPR algorithm satisfying [W-l] and [W-2].


Proof. From Lemma 3.l7b and (3.3.5c), it follows that steplengths a(k) for

which [W-2] holds satisfy

(1- C,)lIg(k) 112 M IIS(k) 112

(3.3.8a)

On the other hand, we have seen in Le=a 3.l7a that [W-l] holds for any

positive a(k) ~ a(k), where a(k) is as defined in (3.3.6b). But clearly,

(3.3.8b)

Hence, for any a(k) In the interval [g(k),a(k)], both [W-l] and [W-2] hold

simultaneously. I

We now look into the convergence properties of the GPR algorithm executed

without regular restarts. In this connection, some additional conditions are

needed for establishing the convergence criterion

lim IIg(k) 11 = 0 k-oo

(3.3.9a)

or even the weaker criterion

!im IIg(k) 11 = O. (3.3.9b) k_oo

The next two theorems establish some general conditions for the convergence

of the GPR algorithm.

Theorem 3.19. Suppose that in the GPR algorithm, g(k) =1= 0 for all k and at

each iteration a(k) is chosen so as to satisfy (3.3.3) for some p > O. Assume that, in addition to the conditions {AP-I} and {AP-2}, the following condition holds:

00

[AP-4] The series L cos2 (jCk) is divergent, where e(k) "" (- g(k)A s(k) . k=!

Then the limit (3.3.9b) is achieved.

Proof. Suppose that (3.3.9b) does not hold. Then 3f > 0 3

(3.3.l0a)


for all k. It then follows from (3.2.16), (3.3.10a) and (3.3.3) that

1 (g(k)TS(k)2

- IIg(k)1I2 s(Ws(k)

:::; _1 (I(k) _ I(HI)) pe2

and hence ' 1,

(3.3.10c)

But the continuous function 10 is bounded on the compact set L(X(I). Letting

r = inf{f(x) : x E L(x(i)}, (3.3.10d)

we thus have

(3.3.10e)

whence, by the monotone convergence property of positive term series, it follows 00

that L cos2 (I(k) is convergent. k=1

This contradiction establishes the theorem. I

Theorem 3.20. If in Theorem 3.19, the condition [AP-4} is replaced by the

condition

[AP-5] the sequence {cos (I(k)} is bounded away from 0,

then the limit (3.3.9a) is achieved by the GPR algorithm.

Proof. For each k, we have,

k

l(k+I) = 1(1) + L(I(HI) - l(j). j=1

Hence, using (3.3.3), (3.1.17) and (3.2.15), we get,

k

I(HI) :::; 1(1) - p L IIg(j) 11 2cos2 (1(j), j=1

(3.3.11a)

(3.3.lIb)


where e 0 3

(3.3.llc)

for all j, and hence, k

j 0 whenever g(k) =f O. Assume that conditions fAP-l}, fAP-2} and fAP-4} hold. Then either a finite sequence {x(k)}

is obtained whose last term x(m) satisnes g(x(m) = 0 or else the sequence {x(k)}

has a limit point x such that g(x) = O. If, instead of fAP-4}, the condition fAP-5} is assumed, then g(x) = 0 for all limit points of {x(k)}.

Proof. The iteration stops whenever g(x(m) = 0 for some m.

Suppose now that g(k) =f 0 for any k.

Assume that [AP-4] holds. Then, by Theorem 3.19,

lim IIg(k) 11 = 0, k-oo


and since {x(k)} is a sequence in the compact set in L(x1, so, by standard results

of analysis, there exists a subsequence {x(kj )} of {x(k)} such that

and

.lim g(x(kj = 0 ,-00

lim x(kj) = x ;-00

for some x E L(x(l. But then, by the continuity of g,

and hence

g(x) = 0

for the particular limit point x of {x(k)}.

(3.3.12a)

(3.3.12b)

On the other hand, if [AP-4] is replaced by [AP-5], then it follows from

Theorem 3.20 that lim g( x(k = o.

k-oo

Hence, for any convergent subsequence {X(kj)} of {x(k)} with .lim x(kj) = x, we )-00

have,

It may be remarked that if the sequence {x(k)} has just one limit point x (that is, if {x(k)} is convergent), which is usually the case in practice, then it makes

no difference whether we take [AP-4] or [AP-5]. In this case, f(x(k ! f(x) and g(x) = 0, and therefore x is a local minimizer of f or possibly a saddle point.

We conclude this section with some comments about the conditions [AP-4]

and [AP-5] used in the convergence proofs.

The condition [AP-4] is much weaker than the condition [AP-5] and is

the weakest condition that has been used to prove global convergence for CG

algorithms (Fletcher[Fl]). Regarding [AP-5], we note that negligible reductions in

function values can occur if the search directions S(k) are close to being orthogonal

to the negative gradients, and the condition [AP-5] ensures that this does not

happen. We have already seen that a sufficient condition for the realization of

[AP-5] with the GPR algorithm is [AP-3]. Another set of conditions, adapted

from Liu and Storey[L2], is considered below.


Proposition 3.22. In the GPR algorithm, under fAP-l} and [AP-2}, suppose

that V k > 1, [AP-6aJ s(k-l)T G(k)s(k-l) > 0

[AP-6bJ g(k)T G(k)g(k) > 0

[AP-6dJ (g(k)TG (k)s(k-l)2 ~ (1- .;. )(g(k)TG(k)g(k)(s(k-l)TG(k)s(k-l)

for some "f. ~ 1

whenever g(k) i- O. Then if 00 1

[AP-6eJ L = 00, k=l 1 + "f. r.

then [AP-4} holds. On the other hand, if

[AP-6fJ lim 1 > 0, k-oo 1 + "f. r.

then [AP-5} holds.

Proof. Set V k > 1,

u. ~ g(k)TG(k)s(k-l),

~ g(k)T s(k-l)

q. = g(k)Tg(k) .

Then it follows from [AP-6aJ, [AP-6bJ and [AP-6dJ that

u2 1 1--'->->0

t. v. - "f.

Moreover, from (3.l.I5c) and (3.1.I5d), we obtain, V k > 1,

(3.3.I3a)

(3.3.I3b)

(3.3.I4a)


which gives, on simplification,

Thus,

(3.3.14b)

(3.3.14c)

using [AP-6c] and (3.3.13a).

Now, from (3.1.15b), (3.1.15c) and (3.3.14a), we obtain, V k > 1,

S(k)=_( t.-q.u. )g


In this connection, it may be noted that if G(k) is positive-definite, then [AP-6a], [AP-6b] and [AP-6c] hold with r. ~ X(k), where X(k) is the spectral condition number of G(k), that is, the ratio )!k) /21 (k) of the largest to the smallest eigenvalues of G(k) _ It is possible to see that [AP-6c] and [AP-6d] are verified for

r. = 'Y. = r ~ 1. Then the restrictions [AP-6e] and [AP-6f] are automatically

satisfied.

We further observe that the Zoutendijk condition (Zoutendijk[Z2])

00

Lcos2e(k)lIg(k)1I2 < 00 (3.3.18a) k=l

is satisfied by the GPR algorithm under conditions [AP-1] and [AP-2] and the line

search condition (3.3.3). This is so, because, from (3.3.3), (3.2.16) and (3.1.17),

we have, J(k+1) :::; J(k) _ p cos2 e(k) IIg(k) 112

for all k so that "IN > 1,

N

L cos2 e(k)IIg(k) 112 :::; ::'(1(1) - r), k=l p

where r is as defined by (3.3.10d).

(3.3.18b)

(3.3.18c)

We now analyse some weakened conditions for the convergence of the GPR

algorithm. From (3.3.18a) and (3.2.15), we have

00 IIg(k) 114 {; Ils(t) 112 < 00. (3.3.l8d)

Hence if the limit (3.3.9b) is not achieved, then since

(3.3.l8e)

so, by the comparison test, 00 1

{; IIs(k) 112 < 00 (3.3.19)

which requires that IIs(k)lI-+ 00 ~ufficiently rapidly. Indeed, if IIsCk) 112 = O(k) as k -+ 00, then lIo(h ll 2 ~ ct for some c> 0 and hence


and the failure of (3.3.19) implies that the limit (3.3.9b) is achieved. We discuss

below some conditions which ensure this with the GPR algorithm.

We continue to assume that conditions [AP-l] and [AP-2] are satisfied and

the GPR algorithm is initiated at X(I) as in [AP-2]. By continuity of g(.), we

have, (3.3.20)

for all k ~ 1.

Proposition 3.23. For all k > I > 1,

11 8(k) 11 < {) (118(1-1) 11) (1 + fJ(k)' + fJ(k)'fJ(k-l)' + ... - IIg(l-I) 11

+ fJ(k)' fJ(k-I)' . fJW ) ! . (3.3.21)

Proof. We have, from (3.2.14a), (3.2.14c) and (3.3.20),

liP) 112 ~ {)2 + fJ(k)'lIs(k-l) 112 (3.3.22a)

for all k > 1.

Consider any I > 1.

From (3.3.22a), we obtain, using (3.2.14b) and (3.3.20),

Ils(l) 112 < {)2 (lIs(l-I) 112) + fJ(I)' (118(1-1)112) {)2

- IIg(l-I)112 IIg(l-I)112

( IIs(l-I) 112 ) ,

= {)2 IIg(l-1) 112 (1 + fJ(1) ) (3.3.22b)

and further assuming (3.3.21) for some k ~ I,

118(k+l) 112 < {)2 (118(1-1) 112) + fJ(k+l)' {)2 (118(1-1) 112) (1 + fJ(k)' - IIg(l-I) 112 IIg(l-I) 112

+ fJ(k)' fJ(k-l)' + ... + fJ(k)' /3(k-I)' .. /3(1)')

= {)2 (liS (1-1) 112) (1 + fJ(k+ I )' + fJ(k+l)'fJ(k)' IIg(l-I) 112

+ ... + fJ(k+I)' fJ(k)' ... fJ(I),). (3.3.22c)

From (3.3.22b) and (3.3.22c), by induction, the proposition is verified. I


We now consider the assumption:

[AP-7] There exists 8 > 0 such that Vk ~ 1,

(3.3.23)

We remark that the search direction s(k) in the GPR algorithm is independent

of the length of the auxiliary vector S


M IIg(k) IIIIS b - p8 - ,

where b > 1 for 8 sufficiently small.

On the other hand, (3.2.14c) gives

IId(k-l) 11 ~ A => a(k-l) liP-I) 11 ~ A

=> IIs(k-I)1I ~ ~ a

for k > 1, where 0 < a ~ a(k) exists by Proposition 3.18. obtain, from (3.3.27a), using (3.3.20), (3.3.23) and (3.3.28a),

1.B(k) I ~ M{)A 1 p82 a = b

if, by (3.3.27b),

39

(3.3.27a)

(3.3.27b)

(3.3.28a)

So, in this case, we

(3.3.28b)

(3.3.28c)

The above Proposition shows that the GPR algorithm shares the "Property

(*)" of Gilbert and Nocedal[G3] under certain conditions which are not too restrictive. The next proposition, adapted from Gilbert and Nocedal[G3], shows

that if, in addition, some restriction on the step sizes in the GPR algorithm is

imposed, then IIs(k) 112 can grow at most linearly.

Proposition 3_25. Suppose that in the GPR algorithm, g(k) t- 0 for all k and that the conditions lAP-9aJ and lAP-9bJ are satisfied. Then if

[AP-I0] For any A > 0, there exist integers / > 1 and r ~ 1 such that

for any index k ~ /, the number of indices i E k, k + r - 1 for whichlld(i-I)1I > A does not exceed j,

then IIs(k)1I2 ~ c(k -/ + 2) for k ~ I, where c > 0 depends on 1 but not on k.

Proof_ For A > 0 satisfying [AP-9b], consider integers 1 > 1 and r ~ 1 given by


[AP-IO]. By Proposition 3.23, we have, for k > I,

IIs(k) 112 ~ c(I + /J 0 depends on I but not on k.

Consider the product

(3.3.30a)

of (k - i + 1) factors of the form p(t)2, where i ~ t < k and I ~ i S k.

If k - i + 1 ~ T, then we have, by [AP-9al,

(3.3.30b)

If k - i + 1 > r, let k - i + 1 = mT + h, where m ~ 1 and 0 ~ h < T and rewrite p(i) by grouping consecutive T factors from the beginning:

p(i) = p(i) p(i) ... p(i) Q(i) o 1 m_1 ' (3.3.30c)

where

p(i) , ~ p(k,)2 p(k, _1)2 . P(k'+l +1)2 (3.3.3Ia)

k, ~ k - tT, o ~ t ~ m-I, (3.3.3Ib) and

Q(i) ~ fJ(km)2 fJ(km _1)2 . p(i)2, (3.3.3Ic)

km ~ k-mT, (3.3.3Id)

there being T factors in each p,(i) and h factors in Q(i) (Q(i) = 1 if h = 0).

Let p!i) be the nwnber of indices j E k'+l + 1, k, such that IldU-I)1I > A. By [AP-lO],

P( i) < ~ (3 3 32 ) , - 2 .. a

Chapter 9: Generalized PolakRibiere Algorithm 41

and hence, by IAP9a] and IAP9b],

F) p(i) < W)P~') (:2) r-p, ,

-(b~ ) r-2p~')

:5 1 (3.3.32b)

in view of (3.3.32a) and b > 1. Moreover, by IAP9a],

(3.3.32c)

Thus, from (3.3.30b), (3.3.30c), (3.3.32b) and (3.3.32c), it follows that

for each I :5 i :5 k and hence from (3.3.29), we obtain, for k ~ I,

asb>1. I

IIP )1I 2 :5 c(1+ b2r (k -I + 1))

:5 cb2r(k-I+2) (3.3.33)

From the above discussion, we then have the following convergence result:

Theorem 3.26. Suppose that conditions [APl] and [AP.2] are satisfied and the

GPR algorithm is executed with line searches satisfying (3.3.3) for some p > o. Then if g(k) i 0 for all k and conditions [Ap.ga], [AP-9b] and [APlO] hold, then

Proof. This follows directly from Proposition 3.25 and the preceding discussion.

I

Of course, in view of Proposition 3.24, we can replace the conditions IAP9aJ

and IAP9b] by conditions IAP.7] and lAP-S] in the above theorem.


. 3.4 Rate of Convergence

In this section, we analyse the rate of convergence of the GPR algorithm for

solving the problem [P] under assumptions [AP-1]-[AP-3] (as stated on page 15 and page 23) and the following additional assumption:

[AP-Il] For x(l) as in [AP-2], the level set L(x(l is convex and

3B > 0 :3 \Ix', x" E L(X(I,

IIG(x') - G(x")1I ::; B IIx' - xliii (3.4.1 )

We also assume that the GPR algorithm, initiated at X(l) satisfying [AP-2], is

executed with exact line search at each iteration. Our approach follows that in

Cohen[C2].

It may be remarked that in case of a quadratic objective function, the GPR algorithm with exact line search terminates at the optimal point in at most n

iterations. If the' objective function is non-quadratic, then finite termination does not occur in general. However, as we shall see, with exact line search,

the algorithm possesses n-$tep quadratic convergence when reinitialized with

a steepest descent direction.

We only consider the case when the GPR algorithm is reinitialized.

Let 4> denote the GPR algorithm applied to the general function f with exact line search at each step and described by (3.1.15) with s(k) = s(k).

For each reinitialized point x(k) constructed by 4>, let F(k) : Rn -+ R be the

quadratic function defined by

F(k)(x) ~ f(k) + g(k)T(X _ x(k + Hx _ x(kTG(k)(x _ x(k. (3.4.2)

Suppose that 4> F(.) denotes the GPR algorithm applied to F(k) starting at x(k) and constructing the itE;rates x(k) along directions s(k) at x(k), where ,+1 .,

S~k) = s(k), (3.4.3)

s~k) = _ g(k) + a(k) s(k) for i > 1, I I fJ, .-1


with g?) = V F(k)(x~k and the a~k)'s determined by exact line search (Here, we

use subscripts i to denote the iterates for the


Lemma 3.29. For I ::?: 0,

Proof. The lemma is trivially true for I = 0.

For k ::?: 1 and I ::?: 1, we have,

I-I

IIdk+l) - d k)1I ~ L IIG(k+i+ I ) - dk+iJlI j=o

But from [AP-H], Proposition 3.11 and Lemma 3.27,

IIG(k+i+I ) _ G(k+j) 11 = IIG(x(k+i+ I )) - G(x(k+j))11

.~ B lIa(k+iJ s(k+iJ 11

~ B IIs(k+j)1I m

=OIlP)II Hence,

for alII::?: 0.

Lemma 3.30. For I ::?: 0,

where C(k) is as defined by (3.2.8).

Proof. For k ::?: 1 and 1 ::?: 0, we have,

But, by (3.2.8), [AP-11] and (3.2.23a),

44

(3.4.6)

(3.4.7)


116(HI) - G(HI) 11 = 11 il{ G(x(HI) + t a(HI) P+l) - G(k+ l) }dtll

::; iiIG(x(k+l) + t a(HI) s(HI) - G(k+l) IIdt

::; 2-: IIP+I)II. Hence, by Lemmas 3.27 and 3.29, we have,

for all I ~ O. I

Lemma 3.31. For 0 ::; I ::; n - 1,

IIg(k+l+l) _ g,~~ 11 = O( IIg(HI) - g~:! 11)

45

+ O(lIa(HI) P+l) - a~!~ s~!~ 11)

+ O(IIP)1I2 ). (3.4.8)

Proof. For k ~ 1 and 0 ::; I ::; n - 1, we have, by (3.2.7), (3.2.18) and

(3.2.23a),

IIg(Hl+l) _ g~!~1I = Ilg(k+l) + a(k+1)6(HI)s(k+I) - g~:~ _ a~!~G(k)s~!~ 11

::; IIg(HI) _ g~:~ 11 + 116(HI)( a(HI) p+l) - a~:~ s~!~)1I

+ lIa~!~ (6(HI) _ G(k)s~!~ 11

::; IIg(HI) _ g~!~ 11 + Mlla(HI) s(k+l) - a~!~ s~!~ 11

+ 2. 116(HI) _ G(k) 11 liP) 11. ~ '+1 Hence, using Lemmas 3.28 and 3.30, we conclude that

IIg(HI+l) _ g~:!1I = O(lIg(k+l) _ g~!~II)

forO::;I::;n-l. I

+ O( lIa(HI) p+l) - a~!~ s~!~ 11)

+ O(lIs(k)1I2 )


Lemma 3.32. For 0 ::; I < n - 1,

IIP+I+l) - s~!~ 11 = O(lIs(HI) - s~!~ 11)

+ O(lIg(Hl+l) _ g~!! 11) + O(IIP) 11 2 ). (3.4.9)

Proof. For 0 ::; I < n - 1, we have, from (3.1.15) and (3.4.3),

IIs(HI+1) _ s~!~ 11 ::; IIg


~ f-{M2I1g(HI+I)IIIIP+I) - s~:! IIlIs~~! 112118(HI)11 .+,

+ M2I1g(k+I+I)lIlIs(HI) _ s~:! IIlIs~:! 11 2118(HI)1I

+ M 2 I1g(k+l+l) _ g~!~ IIlIs~~! 11211P+1) 112

+ M2I1g~!~ IIlIs(k+l) - s~:! IIlIs~:! 11 IIs(k+I) 112

+ MlIg~!! IIIIG(HI+I) - C(k) IIlIs~~! IIlIs(HI) 11 3

+ MlIg~!! IIIIG(HI+I) - G(k) IIlIs~~! IIlIs(k+l) 11 3

+ M2I1g~!~ IIIIP+I) - s~:! IIlIs~:! 11 IIS(k+I) 112 }

~ c.1+, {M2( 1 + ~) IIs(HI) - s~~! IIlIs~~! 112I1s(HI) 112

+ M2( 1 + ~) IIP+I) - s~~! IIlIs~~! 112I1P+1) 112

+ M21Ig(HI+l) _ g~!! IllIs~~! 112I1s(HI) 112

+ M2(1 + ~) IIP+I) - s~~! IIlls~~! 112I1s(HI) 112

+ M (1 + ~) IIC(HI+I) _ G(k) IIlls~!! 11 211P+1) 11 3 + M (1 + M) IIC(H1+I) _ C


since, from (3.4_10b), using (3.2.18),

c>+, ~ m 2 I1P+l) 112 IIs~:~ 112. Hence, from (3.4.lOa) and (3.4.11), using Le=as 3.27 and 3.29, it follows that

IIP+l+l) - s~:~ 11 = O(lIs(Hl) - s~:~ ID + O(lIg(Hl+l) - g~:W + O(IIP)1I2 )

for 0 :::; I < n - 1.

Lemma 3.33. For 0 :::; I < n - 1, lIa(k+I)P+l) _ a~:~s~:~1I = O(lIg(Hl) - g~:~II)

+ O(IIP+l) - s~:~ 11) + O(lIs(k)1I2 ).

Proof. For k ~ 1 and 0 :::; I :::; n - 1, we have, by (3.2.22),

where

Il a(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1 IIg(Hl) 112 s(Hl)

= s(Hl)T6(k+l)s(Hl)

c = (s(Hl)T 6(Hl) s(Hl (s(k)T G(k) s(k "'+, '+1 '+1

and 6(k) is as defined by (3.2.8), and hence

Ila(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1

:::; / {lIg(Hl)T(g(HI) _ g~:~)(S~:~TG(k)s~:~)P+l)1I >+,

+ IIg


< 1 - m2I1s(HI) 112 lis!!! 112

{MlIs~!~ 11211~(k+l) IIlIg(k+l) IIlIg(HI) - g~:~ 11

+ MIIP+I) - s~!~ IIlIs~!! III1P+1) IIlIg(HI) IIlIg!!~ 11

+ MIIP+I) 112I1s~!! IIlIg!!~ IIl1g(HI) - g~!~ 11

+ MlIs(HI) 112I1s(HI) - s~!~ IIlIg~:~ W

+ MIIP+I)1I 2 I1s(HI) - s~!~ IIlIg!!~ W

+ IIC(HI) _ G(k) IIIIP+I) 112I1s~!~ IIl1g~!~ 112 }

~ ~2 { 2MlIg(HI) - g~!~ 11 + 3MIIP+I) - s~!~ 11

+ IIC(HI) - G(k) IIlIs~!~ II},

using (3.2.18) and (3.2.14b). Hence, using Lemmas 3.28 and 3.30,

lIa(HI) PH) - a~!~ s~!~ 11 = O(lIg(HI) - g~!~ 11)

for 0 ~ I ~ n - 1. I

Lemma 3.34. For 0 ~ I ~ n - 1,

+ O( IIs(HI) - s~!! ID

+ O(lIs(k) 11 2 ).

(i) IIg(HI) - g~:~1I = O(lls(k)W) (ii) IIs(HI) - s~!~ 11 = O(IIP) 112) (iii) lIa(HI)P+1) - a~!~s~!~1I = O(IIP)1I2)

(3.4.15)

(3.4.16)

(3.4.17)

(3.4.18)

Proof. We prove (3.4.16), (3.4.17) and (3.4.18) simultaneously by finite

induction on lEO, n - 1.


For I = 0, (3.4.16) and (3.4.17) are trivially true, since, by definition of g~k) and s(k)

1 '

(3.4.19a)

(3.4.19b)

That (3.4.18) is also true for I = 0 follows from Lemma 3.33 using (3.4.19a) and

(3.4.19b ).

Assume now that (3.4.16), (3.4.17) and (3.4.18) are true for some 0 :::; I < n-1.

It follows that

by Lemmas 3.31 and 3.33 and the induction hypothesis.

Also,

by lemmas 3.32 and 3.31 and the induction hypothesis.

Moreover, since 1+ 1 :::; n - 1, we have, from Lemma 3.33,

IIC.,


Proposition 3.35. H the iterates x(k), generated by the GPR algorithm, con~ verge to a local solution x' of [P], then at each iteration,

(3.4.21 )

Proof. Applying the mean value theorem to g(.), we obtain,

(3.4.22)

for some e(k) E (x(k),x'). As the sequence {x(k)} in the compact set L(X(I)

converges to x', we have, x' E L(x(l) and hence, e(k) E L(x(!) because L(x(!)

is convex by [AP-ll]. Moreover, since x' is a local minimizer of j, g(x') = O.

Thus, from (3.4.22), we obtain, using (3.0.2),

We now consider the n-step quadratic convergence result for the GPR algo-

rithm applied to the problem [P] under conditions stated in the first paragraph of this section.

Notice that the stationary point x' of the quadratic p(k) is given by F(')

'i1 F(k)(x' ) = 0 FC") ,

(3.4.23)

that is,

(3.4.24)

where t/lNR is the Newton-Raphson algorithm applied to j. Since the Newton-Raphson algorithm t/lNR is quadratically convergent to the stationary point x' of j, we get

IIx' - x'lI = IIt/lNR(X(k) - x'lI F(k)

= O(lIx(k) - x'1I2). (3.4.25)


Also, using the fact that the GPR algorithm tPF(') reaches the minimum

point x' of the quadratic F(k) in at most n iterations, we have, F(t)

I Notice that

_ -I. (k)) x F(') - 'I' F(') X

_ (k) -x . n+l

Hence, using Lemma 3.34, we conclude that

n-l

Il x(k+n) - x 11 = ~ Ila(k+I) s(k+l) - a(k) s(k) 11 F(") L.J '+1 '+1 1=0

- ~"

(3.4.26)

(3.4.27)

We assume that the GPR algorithm is restarted every t iterations (t 2: n) with the steepest descent direction. In this case, s(kt) will be set equal to _g(kt)

every t iterations, and all lemmas considered previously in this section hold for k an integral multiple of t. The next theorem gives the n-step quadratic convergence result for the GPR algorithm applied to [P] with reinitialization every t iterations when an exact line search is adopted at eaclJ step.

Theorem 3.36. For the sequence {x(k)} generated by the GPR algorithm

restarted every t steps with the steepest descent direction,

-.- IIx(kt+n) - x'lI < hm 11 (kt) '11 2 _ C < 00 k-oo X - X

(3.4.28)

for some constant C, where x is a minimum point of f on Rn.

Proof. For such k an integral multiple of~, we obtain as in (3.4.25) and (3.4.27),

(3.4.29)


and

Because s(kt) = _g(kt), (3.4.30) becomes

IIX(kt+n) _ x;('1) 11 = O(lIg(kt)1I 2 )

= O(lIx(kt) _ x"112)

due to Proposition 3.35. So, from (3.4.29) and (3.4.31), we obtain,

IIX(kt+n

) _ x"1I < IIx(kt+n) - x;(") 11 + IIx;(,,) - x"1I

= O(lIx(kt) _ x"1I2)

and this completes the proof of (3.4.28). I

3.5 Characteristic Behaviour and Basic Algorithm

53

(3.4.30)

(3.4.31 )

In this section, we discuss the characteristic behaviour of the GPR algorithm

and formulate an implement able version of it. This is almost identical to

the traditional CG algorithm, differing only in the computation of the search

directions.

We first compare the search directions of the GPR algorithm with those of

the generalized CG method of Liu and Storey[L2] as described in (2.3.18) and

of the memoryless BFGS-QN (abbreviated as MQN) method of Shanno[S4]. To

distinguish, we denote these by s~kJR' s~~) and S~~N respectively.

As in (2.3.18),

(

(k)TG(k) (k) (k) (k) (k-l) 9 9

SLS = - (g , S ) (k)TG(k) (k-l) 9 . S

(3.5.1 )

and hence, using (3.1.15c) and (3.1.16), we have,


s~~) = ~ { _ (s(k-l)T G(k)s(k-lg(k) + (g(k)T G(k)s(k-lS


where

(3.5.7b)

Now, if we choose, in particular, e(k-I) = (l/y(k-I)Ty (k-I), then (3.5.7)

immediately reduces to the MQN search direction:

with

-(k).o. _ d(k-l)y(k-I)T ( y(k_I)T y(k-I) d(k-I)d 1 by the backward finite-difference formula

(k) -(k-I) (k) ;;(k-I) _ 9 - 9

G s - 8 (3.5.10)

with g(k-J) .0. g(x(k) -8 s(k-I) and 8 any suitable positive small number, then the

Hessian matrix G(k) itself need not be computed or stored. This way of avoiding


matrix computation and storage can result in significant savings on large-scale

problems, and may be essential when it is not possible to store G(k). The linear

algebra required to obtain the product on the left hand side of (3.5.10) is also

reduced. On the other hand, this gain is obtained at the expense of one additional

gradient evaluation per iteration. Now considering (3.1.16) and (3.5.10), (3.1.15d)

becomes for k > 1,

(3.5.11)

An implement able algorithm for the GPR algorithm with natural restart and

based on the search direction given in (3.1.15b) is stated below:

Algorithm: GPRl

Step 1. Let x(1) be an estimate of a minimizer x of f.

Step 2. Set k = 1 and compute s(1) = _g(1).

Step 3. Line search: compute X(k+l) = x(k) + a(k)ik ) and then compute g(k+l).

Step 4. IT IIg(k+l)1I < E, take x(k+l) as x and stop. Otherwise go to Step 5.

Step 5. IT k + 1 > n > 2, then go to Step 11. Otherwise to to Step 6.

Step 6_ Compute

Step 7. With 6 = min(1, .,Jry/v's(W s(k), compute

g(k) = g( x(k+l) _ 6s(k).

Step 8. Compute

Chapter 9: generalized Polak-Ribiere Algorithm 57

Step 9. Compute new search direction

Step 10. Set k = k + 1 and go to Step 3.

Step 11. Set x(k+l)toX(l) and repeat Step 2 onwards.

To implement the GPR algorithm requires approximately 5n + 3 double-precision words of working storage locations and O(n) operations per iteration.

CHAPTER 4 SOME MODIFICATIONS OF THE GPR ALGORITHM AND THEIR IMPLEMENTATIONS

In this Chapter, we consider several modifications of the GPR algorithm to

try to improve its convergence properties and computational efficiency and discuss

their implementations for solving the general nonlinear problem

[P] Minimize J(x), x ERn.

We also discuss the theoretical and algorithmic behaviour of these modified

algorithms.

As in the previous chapter, we assume that the objective function J satisfies the basic conditions [AP-I] and [AP-2] of that chapter and that the GPR

algorithms are initiated at x(!) satisfying [AP-2]. Other conditions will be added

whenever they are required.

A CG algorithm with exact line search finds the minimum of a convex

quadratic function of n variables in at most n iterations. Such an algorithm

with periodical restart is often globally convergent as well as n-step quadratically

convergent when the line search is taken to be asymptotically exact (Luksan[L3]

and Baptist and Stoer[B6]). Nevertheless, without regular restart, the PR

algorithm can cycle infinitely without approaching an optimal point or can

sometimes slow down away from the optimal point (Powell[P4, P2, P5]), because

a very small step IIx(k+1) - x(k)1I is taken at each iteration. The GPR algorithm

is approximately the same as the PR algorithm when working with an exact line

search and we have noticed that, in general, it may require a very large number

of iterations to approach the solution point unless a restart is made occasionally

with the steepest descent direction. So, we propose some restarting strategies for

Chapter 4: Some modification.s 0/ the GPR Algorithm 59

the GPR algorithm which will hopefully improve its computational efficiency and

CPU time.

4.1 GPR Algorithm with Non-negative Beta

Powell[P4] has shown that there are functions / satisfying conditions [AP-l]

and [AP-2] for which the PR algorithm, even with exact line search and exact

arithmetic, generates gradients which stay bounded away from zero. Powell's

example requires that some consecutive search directions become almost opposite,

and as this can only occur, in case of exact line search, when p~':2 < 0, so Powell[P5] suggests a new implementation of the PR algorithm with a non-

negative value for p~':2 taken at each iteration. Motivated by Powell's suggestion

and the fact that P~kjR ~ p~':2 when working with exact line search, we propose an implementation of the GPR algorithm with P~kjR ~ 0 to prevent cycling. This

modified algorithm will be called the GPR Algorithm with Non-negative Beta

(GPR+ Algorithm, in short).

The search directions in the GPR+ algorithm, obtained from (3.1.15), are

where p(k) is given by GPR+

p(k) = max{p(k) ,O} GPR+ GPR

for k = 1, for k > 1, (4.1.1)

(4.1.2)

on all iterations, where P~~R is given by (3.1.15d). Thus, the GPR + search directions are given by

S(k) = { s~jR' if P~kjR > 0, GPR+ _g(k), otherwise

( 4.1.3)

with k ~ 1. It follows that the GPR + algorithm has the "automatic" restarting procedure depending on the values of P~kjR.

Since (4.1.1) and (4.1.3) are equivalent, it is immaterial which particular

form is used to describe the GPR+ search directions, and we shall use (4.1.3) in

our implementation.

Chapter 4: Some modificatiom of the GPR Algorithm 60

It is easy to see that the GPR + algorithm preserves the descent property

(3.1.17) at each iteration. Moreover, it follows from the construction of s(k) GPR+

that the GPR+ algorithm inherits all the properties of the GPR algorithm.

We now formalize a convergence theorem for the GPR + algorithm. For

convenience of notation, we continue to use S 0, k-oo

00 L lI u (k) - u(k-I)1I2 < 00, k=2

Proof. By (4.1.5), 3 > 0 3

(4.1.5)

(4.1.6)

(4.1.7a)

where f) is as defined in (3.3.20). Moreover, as in Proposition 3.24, 3 b > 1 3

(4.1.7b)

for all k > 1.

Chapter ./: Some modification3 of the GPR Algorithm

From (3.1.15b) and (3.1.15c), we obtain,

P) = - (1 + rP)q.)g(k) + rP)P-l)

and hence

for all k > 1, where

"" g(k)TS(k-l)

q. - g(k)Tg(k) ,

r(k) "" (1 + fJ 1,

and hence,

(i) r(k)\(k) = IIr(k) 112 +6.r(k)Tu (k-l),

(ii) 1 = U(k)T u(k)

= IIr(k)1I2 + 20. r(k)T u(k-l) + o~,

(iii) u(k) _ u(k-l) = r(k) + (0. _ l)u(k-l)

lI u(k) _ U(k-l) 112 = IIr(k) 112 + 2(0. _ l)r(k)T u(k-l) + (0. _1)2

= 2(1 - o. _ r(k)T u(k-l)

= 2(1- o~ - (1 + o.)r(k)Tu (k-l)/(1 +6.) = 2( r(k)T u(k) _ r(k)T u(k-l)/(1 + 6.)

= 2r(k)T(u(k) _ u(k-l)/(1 + 0.)

~ 2I1r(k)lIl1u(k) - u(k-l)II/(1 + 0.),

61

(4.1.8)

(4.1.9)

(4.1.10a)

(4.1.10b)

(4.1.10c)

(4.1.11)

(4.1.12a)

(4.1.12b)

.< 4.1.12c)

Chapter 4: Some modification3 of the GPR Algorithm

using the Cauchy-Schwarz inequality. Hence,

IIU(k) - u(k-l)1I ~ 2I1r(k)lIf(1 + 6.)

~ 2I1r(k) 11,

since 6. ~ O. But, from (4.1.10b),

IIr(k)1I = 11+ P(k)q.lllik)lIflls(k)1I

~ cllg(k) IlfllP) 11

62

(4.1.13)

(4.1.14)

for some c ~ 1, since, by (4.1.7a), (4.1.7b), (4.1.10a), (3.1.17) and (3.1.20) or

(3.3.2), we have,

From (4.1.13), (4.1.14) and (3.2.15), it then follows that V k > 1,

where e(k) ~ (- g(k)'s(k). So, if (4.1.6) fails, then

00 L cos2 e(k) = 00 k=l

(4.1.15)

(4.1.16)

and hence, by Theorem 3.19, the assumption (4.1.5) fails. This completes the

proof. I

Theorem 4.2. Suppose that conditions {AP-7} and {AP-S} hold in addition to

the conditions stated previously. Then the limit

lim IIg(k) 11 = 0 (4.1.17) k-oo

is achieved by the GPR + algorithm.

Proof. First suppose that the condition [AP-10] is satisfied.

Chapter 4: Some modificatioTt-' of the GPR Algorithm 63

We see, from Proposition 3.24, that conditions [AP-9a] and [AP-9b] are

satisfied. So, by Proposition 3.25, there exist an integer I > 1 and a constant c> 0 (depending on I) such that for k ~ I,

(4.1.18a)

It follows that

(4.1.18b)

and hence (4.1.17) is achieved. For, if not, then 3f > 0 3 V k ~ 1, IIg(k)1I 2: f and so, since by (3.2.15),

we have, 00 L cos2 e(k) IIg(k) 112 = 00 ( 4.1.18c)

k=l

contradicting the Zoutendijk condition (3.3.18a).

We now consider the case when [AP-I0] is not satisfied. In this case:

[AP-I0]* There exists>. > 0 such that for all integers I > 1 and T ~ 1,

there exists an integer k ~ I such that the number of indices

i E k, k + T - 1 for which IId(i-l) 11 > >. is greater than t.

Assume that fun IIg(k)1I of O. (4.1.19) k-oo

The sequence {x(k)} being bounded, 3B > 0 3 IIX(k) 11 ::; B for k ~ 1. With >. as in [AP-I0]*, define the integer T ~ 1 by

8B 8B T ::; T < T + 1. (4.1.20a)

By Proposition 4.1, in view of (4.1.19), we have (4.1.6) and hence, with T as

above, there exists an integer I > 1 such that

(4.1.20b)

Chapter 4: Some modification" of the GPR Algorithm 64

If we select k ~ I as in [AP-lOr, then, since

k+r-l

= L IIJi-l)lIu(i-l) i=k

we have,

k+r-l = x(k+r-l) _ x(k-l) _ L IIJi-l) 11 (u(i-l) _ u(k-l).

i=k

Hence, taking norms,

k+r-l k+r-l L IId(i-l)1I ::; 2B + L IIJi-l)lIlIu(i-l) - u(k-l)lI ( 4.1.20c) i=k

But for i E k,k + r -1, using the Cauchy-Schwarz inequality and (4.1.20b),

i-I

lIu(i-l) - u(k-l)1I ::; L lIu(j) - u(j-l)11 j=k

::; r1 Ur) 1 1

=2

and hence, from (4.1.20c), we obtain,

Chapter.4: Some modification.! of the GPR Algorithm

= ~ L II .. P-')II + ~ L lIct 1 L lIct,\

> .!..A~ 2 2 .AT

="4'

in view of [AP-10t. Thus, we have,

SB T < A'

65

contradicting (4.1.20a). This contradiction leads to the denial of (4.1.19) and so

(4.1.17) is achieved. I

For implementing the GPR + algorithm, we have the following modified

version of the Algorithm GPR1:

Algorithm: GPR2

This is the same as Algorithm GPR1 except that an additional step, namely

Step Sa, is inserted in between Step S and Step 9:

Step Sa. If ,8~~tl) > 0, then go to Step 9. Otherwise go to Step 11.

4.2 GPR Algorithm with Powell Restart

Regarding the GPR + algorithm, we notice from (3.2.10a) that, when working with exact line searches a(k) > 0 iff g(k)T(g(k) - g(k-I) > 0 that is iff , ,vGPR - -, ,

(4.2.1)

So, the GPR+ algorithm with exact line search induces a restart in the steepest

descent direction whenever (4.2.1) is violated, that is, when

( 4.2.2)

Chapter 4: Some modification~ of the GPR Algorithm 66

The condition (4.2.2) is a less restrictive restarting criterion than the Powell

restarting criterion (Powell[P2])

(4.2.3)

Even though, Powell's criterion (4.2.3) was designed to ensure the conver-

gence of Beale's restarting algorithm (Powell[P2]), we consider its use with the

GPR algorithm in the hope of improving efficiency and convergence. The resulting

algorithm, called the Powell Restarting GPR Algorithm (PGPR Algorithm, in

short) is just as in (3.1.15) except that f3;~PR = f3~~R if

( 4.2.4)

d a(k) an ,vPGPR = 0 if (4.2.3) occurs.

Moreover, we find, in view of (3.1.15), that the PGPR search direction s~~PR can be written as

s(k) _ SGPR' {

(k)

POPR - _g(k),

for k :2: 1, and so the descent property

if (4.2.4) holds, if (4.2.3) holds

(4.2.5)

(4.2.6)

is satisfied at all iterations. It also follows that all the standard properties

of the GPR algorithm apply directly to the PGPR algorithm under the same

assumptions as those imposed on the GPR algorithm. For instance, we can see

that the global convergence results, stated in Theorem 3.19 and Theorem 3.20,

hold for the PGPR algorithm provided we suppose that g(k) 'f 0 for all k and that the line search is taken satisfying (3.3.3) with some p > O.

It may be observed from (2.3.15), (3.2.10a) and (4.2.4) that in implementing

the PGPR algorithm with exact line search, a restart is not induced if

la(k) _ a(k)1 < 0.2 a(k) ,v GPR ,vFR - fJFR ' (4.2.7a)

that is, if

(4.2.7b)

Chapter 4: Some modification.! of the GPR Algorithm 67

Thus in such implementations of the PGPR algorithm, satisfaction of (4.2. 7b), is

a measure of adequacy of ,8~':?R'

Since the gradients are orthogonal when the GPR algorithm is applied to

a quadratic function q(-) and exact line searches are performed (see Proposition 3.3), and since (4.2.3) decides whether enough orthogonality between g(k-I) and

g(k) has been lost to warrant a restart, it is necessary for the implementation of

the PGPR algorithm on a quadratic function that the line searches be almost

exact at all iterations. This means that the line search at each iteration must

perform at least one cubic interpolation, which is, in fact, very expensive in terms

of function and gradient evaluations.

To implement the PGPR algorithm, we modify the Algorithm GPRl as

described below:

Algorithm: GPR3

Just add the following new step, namely Step 5a, to Algorithm GPRl in

between Step 5 and Step 6.

Step 5a. If Ig(k+I)T g(k) I > 0.2I1g(k+l) 11 2 , then go to Step 11. Otherwise go to Step 6.

4.3 Shanno's Angle-Test Restarting GPR Algorithm

Shanno's angle-test restart procedure (Shanno[S2]) for CG algorithms sets

up a switching criterion for restarting such algorithms with a steepest descent

direction when the cosine of the angle between the search direction and the

negative gradient is greater than a constant multiple of the cosine of the angle

between the FR search direction and the negative gradient. It thereby assures

the convergence of the modified algorithm, as the FR algorithm is globally con-

vergent. We propose, therefore, an implementation of the GPR algorithm which

incorporates the angle-test restart. This new implementation will be referred

to as the Shanno's Angle-Test Restarting GPR Algorithm (SGPR Algorithm, in

short).

Shanno's procedure is based on consideration of the FR process with exact

line search (so that g(k)TS~~-I) = 0). Thus we have,

Chapter 4: Some modificatioTl-' of the GPR Algorithm

for k ~ 1 and

IIg(k) 112 IIs~~1I2

for k > 1, where e~~ "" (- g(W s~~). From (4.3.2), it follows that Vk > 1,

IIs~~ 112 = IIg(k) 112 + .B~~2I1g(k-1) 112 + .B~~2 .B~~-I)2I1g(k-2) 112

+ ... + .B~~2 .B~~-1)2 . .B~~2I1g(l) 112

Hence, using (2.3.15), we obtain,

k

= IIg(k) 1142: IIg(l) 11-2 1=1

and so, from(4.3.1), we have

cos2e(k) = ___ ..:1 __ _ FR k

IIg(k) 1122: IIg(l)U-2 1=1

Assuming that g(k) :f 0 for all k, define

,),(k)2 ~ ---k,.:..T __ -

IIg(k) 112 2: IIg(1) I( 1=1

with T > o. We now use the test

cos2e(k) (k)2 GPR ~ ')' ,

68

(4.3.1)

(4.3.2)

(4.3.3)

(

A theoretical and computational investigationof a ... · GENERALIZED POLAK-RIBIERE • ALGORITHM...

Documents

Transcript of A theoretical and computational investigationof a ... · GENERALIZED POLAK-RIBIERE • ALGORITHM...