An Efficient Barzilai–Borwein Conjugate Gradient Method for...

28
Journal of Optimization Theory and Applications https://doi.org/10.1007/s10957-018-1393-3 An Efficient Barzilai–Borwein Conjugate Gradient Method for Unconstrained Optimization Hongwei Liu 1 · Zexian Liu 1,2 Received: 17 December 2017 / Accepted: 14 September 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018 Abstract The Barzilai–Borwein conjugate gradient methods, which were first proposed by Dai and Kou (Sci China Math 59(8):1511–1524, 2016), are very interesting and very effi- cient for strictly convex quadratic minimization. In this paper, we present an efficient Barzilai–Borwein conjugate gradient method for unconstrained optimization. Moti- vated by the Barzilai–Borwein method and the linear conjugate gradient method, we derive a new search direction satisfying the sufficient descent condition based on a quadratic model in a two-dimensional subspace, and design a new strategy for the choice of initial stepsize. A generalized Wolfe line search is also proposed, which is nonmonotone and can avoid a numerical drawback of the original Wolfe line search. Under mild conditions, we establish the global convergence and the R-linear con- vergence of the proposed method. In particular, we also analyze the convergence for convex functions. Numerical results show that, for the CUTEr library and the test problem collection given by Andrei, the proposed method is superior to two famous conjugate gradient methods, which were proposed by Dai and Kou (SIAM J Optim 23(1):296–320, 2013) and Hager and Zhang (SIAM J Optim 16(1):170–192, 2005), respectively. Keywords Barzilai–Borwein method · Barzilai–Borwein conjugate gradient method · Subspace minimization · R-linear convergence · Nonmonotone Wolfe line search Mathematics Subject Classification 49M37 · 65K05 · 90C30 B Zexian Liu [email protected] Hongwei Liu [email protected] 1 School of Mathematics and Statistics, Xidian University, Xi’an 710126, People’s Republic of China 2 School of Mathematics and Computer Science, Hezhou University, Hezhou 542899, People’s Republic of China 123

Transcript of An Efficient Barzilai–Borwein Conjugate Gradient Method for...

  • Journal of Optimization Theory and Applicationshttps://doi.org/10.1007/s10957-018-1393-3

    An Efficient Barzilai–Borwein Conjugate Gradient Methodfor Unconstrained Optimization

    Hongwei Liu1 · Zexian Liu1,2

    Received: 17 December 2017 / Accepted: 14 September 2018© Springer Science+Business Media, LLC, part of Springer Nature 2018

    AbstractThe Barzilai–Borwein conjugate gradient methods, which were first proposed by Daiand Kou (Sci China Math 59(8):1511–1524, 2016), are very interesting and very effi-cient for strictly convex quadratic minimization. In this paper, we present an efficientBarzilai–Borwein conjugate gradient method for unconstrained optimization. Moti-vated by the Barzilai–Borwein method and the linear conjugate gradient method, wederive a new search direction satisfying the sufficient descent condition based on aquadratic model in a two-dimensional subspace, and design a new strategy for thechoice of initial stepsize. A generalized Wolfe line search is also proposed, which isnonmonotone and can avoid a numerical drawback of the original Wolfe line search.Under mild conditions, we establish the global convergence and the R-linear con-vergence of the proposed method. In particular, we also analyze the convergence forconvex functions. Numerical results show that, for the CUTEr library and the testproblem collection given by Andrei, the proposed method is superior to two famousconjugate gradient methods, which were proposed by Dai and Kou (SIAM J Optim23(1):296–320, 2013) and Hager and Zhang (SIAM J Optim 16(1):170–192, 2005),respectively.

    Keywords Barzilai–Borwein method · Barzilai–Borwein conjugate gradientmethod · Subspace minimization · R-linear convergence · Nonmonotone Wolfe linesearch

    Mathematics Subject Classification 49M37 · 65K05 · 90C30

    B Zexian [email protected]

    Hongwei [email protected]

    1 School of Mathematics and Statistics, Xidian University, Xi’an 710126,People’s Republic of China

    2 School of Mathematics and Computer Science, Hezhou University, Hezhou 542899,People’s Republic of China

    123

    http://crossmark.crossref.org/dialog/?doi=10.1007/s10957-018-1393-3&domain=pdf

  • Journal of Optimization Theory and Applications

    1 Introduction

    Withwide applications inmanyfields, large-scale optimizationproblemshave receivedmuch attention in recent years. Subspace methods are a class of very efficient numeri-cal methods for large-scale optimization problems because it is not necessary to solvelarge-scale subproblems in each iteration [1]. The subspace minimization conjugategradient (SMCG) method was first proposed by Yuan and Stoer [2], and its searchdirection is computed by minimizing a quadratic approximate model in a subspacespanned by the current gradient and the latest search direction. SMCG methods arethe generalization of the classical conjugate gradient (CG) methods. SMCG methodsare a class of efficient optimization methods and have received some researchers’attention. Andrei [3] presented a SMCG method, where the search direction is gener-ated in a special three-dimensional subspace. Based on [3], Yang et al. [4] developedanother SMCG method, in which the search direction is generated in a special three-dimensional subspace. Motivated by the Barzilai–Borwein (BB) method [5], Dai andKou [6] developed some Barzilai–Borwein conjugate gradient (BBCG) methods forstrictly convex quadratic minimization, and the numerical results in [6] showed thatBBCG3 is themost efficient.Dai andKou [6] thought thatBBCG3will became a strongcandidate for large-scale unconstrained optimization and thus raised a question on howto extend BBCG3 to general unconstrained optimization. And they also pointed outthat it is of great significance to design a suitable line search when extending BBCG3to unconstrained optimization.

    Some famous CG software packages such as CGOPT [7] and CG_DESCENT [8]are very efficient for the test problems in the CUTEr library [9], but for the test prob-lem collection (we call it 80pro_Andrei for short), which is often used and includes80 unconstrained problems mainly from [10], they are not as efficient as for the testproblems in the CUTEr library. Can one develop a CG method for unconstrainedoptimization, which is very efficient not only for the CUTEr library but also for80pro_Andrei?

    The aim of our work is to answer the above two questions. In this paper, we presentan efficient Barzilai–Borwein conjugate gradient method for unconstrained optimiza-tion. Using the idea of the BB method [5] and some important properties of the linearCG method, we derive the search direction satisfying the sufficient descent conditionbased on a quadratic model in a subspace spanned by the current gradient and thelatest search direction and design a new strategy for the choice of initial stepsize.A generalized Wolfe line search is proposed, which is nonmonotone and can avoida numerical drawback of the original Wolfe line search. Under mild conditions, weestablish the global convergence and the R-linear convergence of the proposedmethod.In particular, we also analyze the convergence for convex functions. Some numericalresults show that the proposed method is superior to CGOPT for 80pro_Andrei andthe CUTEr library, and is also superior to CG_DESCENT (5.3) for 80pro_Andrei andis competitive to CG_DESCENT (5.3) for the CUTEr library.

    The remainder of this paper is organized as follow. Some preliminaries are made inSect. 2. In Sect. 3, a Barzilai–Borwein conjugate gradient method for unconstrainedoptimization is presented, where the search direcion, the strategy for choosing initialstepsize and the generalizedWolfe line search are derived and analyzed. In Sect. 4, we

    123

  • Journal of Optimization Theory and Applications

    establish the convergence of the proposed method. We compare the proposed methodwith CG_DESCENT (5.3) and CGOPT in Sect. 5. Conclusions are given in the lastsection.

    2 Preliminaries

    Consider the following unconstrained optimization problem

    minx∈Rn f (x), (1)

    where f : Rn → R is smooth and its gradient g is available. More exactly, we assumethat f satisfies the following assumptions.

    Assumption 2.1 (i) The objective function f (x) is continuously differentiable in aneighborhood N of the level set L (x0) = {x ∈ Rn : f (x) ≤ f (x0)} , where x0 isthe initial point; (ii) f (x) is bounded below onRn ; (iii) The gradient g(x) is Lipschitzcontinuous inN , namely, there exists L > 0 such that

    ‖ g(x) − g(y) ‖≤ L ‖ x − y ‖, ∀x, y ∈ N .

    It is worth noting that Assumption 2.1 (ii) is weaker than the usual assumption thatthe level set L (x0) = {x ∈ Rn : f (x) ≤ f (x0)} is bounded in general CG methods.

    Throughout this paper, gk = g(xk), fk = f (xk), sk−1 = xk − xk−1, yk−1 =gk − gk−1, I is the identity matrix and ‖·‖ denotes the Euclidean norm.

    Conjugate gradient methods are a class of powerful methods for unconstrainedoptimization, especially for large-scale unconstrained optimization. Conjugate gradi-ent methods take the following form

    xk+1 = xk + αkdk, k = 0, 1, 2, . . . , (2)

    where αk is the stepsize and dk is the search direction given by

    dk ={−gk, if k = 0,

    −gk + βkdk−1, if k > 0,where βk is usually called conjugate parameter.

    Different choices of βk lead to different CG methods. Some well-known formulaefor βk are called the Fletcher–Reeves (FR) [11], Hestenes–Stiefel (HS) [12], Polak–Ribière–Polyak (PRP) [13] and Dai–Yuan (DY) [14] formulae, and are given by

    βFRk =‖gk‖2

    ‖gk−1‖2, βHSk =

    gTk yk−1dTk−1yk−1

    , βPRPk =gTk yk−1‖gk−1‖2

    , βDYk =‖gk‖2

    dTk−1yk−1.

    CG methods have received great developments, and some recent advances about CGmethods can be found in [7,8,15–21].

    123

  • Journal of Optimization Theory and Applications

    2.1 The Barzilai–Borwein Conjugate Gradient Method for Strictly ConvexQuadratic Minimization

    Since the idea of the BB method [5] was applied to BBCG3 [6], we first give a reviewof the BB method.

    Consider the following strictly convex quadratic minimization problem

    minx∈Rn q (x) =

    1

    2xT Ax + bT x, (3)

    where A ∈ Rn×n is a symmetric and positive definite matrix and b ∈ Rn . The BBmethod for solving (3) takes the form xk+1 = xk − αkgk , where αk is given by

    αBB1k =

    ‖sk−1‖2sTk−1yk−1

    or αBB2k =sTk−1yk−1‖ yk−1‖2 ,

    which can be computed by solving minα>0

    || 1αsk−1 − yk−1||2 or min

    α>0||αyk−1 − sk−1||2.

    It is not difficult to see that the essence of the BB method is to approximate theHessian matrix by the scalar matrix 1

    αBB1k

    I or 1αBB2k

    I . Since 1αBB1k

    and 1αBB2k

    are both

    the Rayleigh quotients of A, we know that

    λmin (A) ≤ 1αBB1k

    ≤ 1αBB2k

    ≤ λmax(A),

    where λmin (A) and λmax (A) are the smallest and largest eigenvalues of A, respec-tively. It is easy to verify that if 1

    αBB1k

    is very small, so is λmin(A) , and if 1αBB2k

    is very

    large, so is λmax(A). Some recent advances about the BB method can be referred to[22–30].

    We subsequently review BBCG3 in detail.In BBCG3, the search direction is determined by solving the following quadratic

    approximate problem

    mind∈Vk

    gTk d +1

    2dT Bkd, (4)

    where Vk = Span {gk, sk−1}, and Bk ∈ Rn×n is a symmetric and positive defi-nite approximation to the Hessian matrix and satisfies the standard secant equationBksk−1 = yk−1. Consider the general case that gk and sk−1 are not collinear. Denote

    d = ugk + vsk−1, (5)

    where u, v ∈ R. Substituting (5) into (4) and using the standard secant equation, wecan arrange (4) as follows:

    minu,v∈R

    ( ‖gk‖2gTk sk−1

    )T (uv

    )+ 1

    2

    (uv

    )T (ρk gTk yk−1

    gTk yk−1 sTk−1yk−1

    )(uv

    ), (6)

    123

  • Journal of Optimization Theory and Applications

    where ρk is an estimate of gTk Bkgk . Inspired by the BB method [5], Dai and Kou [6]

    approximated Bk in the term gTk Bkgk by32‖yk−1‖2sTk−1yk−1

    I and thus obtain

    ρk = 32

    ‖yk−1‖2sTk−1yk−1

    ‖gk‖2, (7)

    which implies that

    Δk =∣∣∣∣ρk g

    Tk yk−1

    gTk yk−1 sTk−1yk−1

    ∣∣∣∣ = ρksTk−1yk−1 −(gTk yk−1

    )2> 0. (8)

    It is not difficult to obtain the unique solution of the quadratic approximate problem(6):

    uk = 1Δk

    (gTk yk−1gTk sk−1 − sTk−1yk−1‖gk‖2

    )(9)

    vk = 1Δk

    (gTk yk−1‖gk‖2 − ρkgTk sk−1

    ). (10)

    BBCG3 for solving (3) takes the form xk+1 = xk + dk, where dk = ukgk + vksk−1.Here uk and vk are given by (9) and (10), respectively.

    It is noted that if the objective function f is quadratic and the line search is exact,the search direction of BBCG3 is parallel to the Hestenes–Stiefel CG direction [12].

    In this paper, our main task is to extend BBCG3 to unconstrained optimization.

    3 The Barzilai–Borwein Conjugate Gradient Method forUnconstrained Optimization

    In this section, we present an efficient Barzilai–Borwein conjugate gradient methodfor unconstrained optimization. Using the idea of the BB method and some importantproperties of the linear CG method, we derive a new search direction based on (6),and design a new strategy for the choice of initial stepsize. A generalized Wolfe linesearch is also developed. For the rest, we assume that sTk−1yk−1 > 0 guaranteed bygeneral Wolfe line search.

    3.1 Derivation of the New Search Direction

    According to some properties of the objective function f at the iterate xk , the derivationof the new search direction dk is divided into the following four cases.

    Case IIf the exact line search is adopted, then gTk sk−1 = 0. Therefore, if

    (gTk sk−1

    )2is

    very large, the current step xk might be far from the exact step x̂k = xk−1 +αesk−1dk−1,where αesk−1 = arg min

    α>0f (xk−1 + αdk−1). Since the linear CGmethod with the exact

    123

  • Journal of Optimization Theory and Applications

    line search enjoys the quadratic termination for strictly convex quadratic functions,it is reasonable to believe that an iterate step close to the exact step x̂k might be alsopreferable in a nonlinear CG method. A natural idea is that if gTk sk−1 is large, the newiterate should go back along sk−1, and if −gTk sk−1 is large, the new iterate should goahead along sk−1. It indicates that u in (5) should be set to u = 0. Substituting u = 0into (6), we obtain that

    ûk = 0 and v̂k = − gTk sk−1

    sTk−1yk−1. (11)

    Therefore, if the following conditions hold, i.e.,

    (gTk sk−1

    )2sTk−1yk−1‖gk‖2

    ≥ ξ4 andsTk−1yk−1sTk−1sk−1

    ≥ ξ3√k, (12)

    where 0.8 < ξ4 ≤ 1 and 0 < ξ3 < 10−4, the search direction is computed by

    dk = ûkgk + v̂ksk−1, (13)

    where ûk and v̂k are given by (11).

    It is noted that our intention of replacing(gTk sk−1

    )2 ≥ ξ4 by(gTk sk−1

    )2sTk−1yk−1‖gk‖2

    ≥ ξ4in the inequality (12) is to ensure that the search direction possesses some importantproperties described in Sect. 3.2.

    It follows from (11) and (13) that if gTk sk−1 > 0, the search direction (13) goesback along sk−1, otherwise it goes ahead along sk−1, which are just consistent withour idea.

    Case IIFor general functions, if

    yTk−1yk−1sTk−1yk−1

    ≤ ξ2 and sTk−1yk−1sTk−1sk−1

    ≥ ξ3√k, where ξ2 > 104 ,

    the condition number of the Hessian matrix might be not very large. In this case,

    we use simply 32‖yk−1‖2sTk−1yk−1

    I to estimate Bk in the term gTk Bkgk , which implies that

    ρk = 32 ‖yk−1‖2

    sTk−1yk−1‖gk‖2. This corresponds to BBCG3.

    Therefore, if the conditions

    yTk−1yk−1sTk−1yk−1

    ≤ ξ2,sTk−1yk−1sTk−1sk−1

    ≥ ξ3√k

    and

    (gTk sk−1

    )2sTk−1yk−1‖gk‖2

    < ξ4 (14)

    hold, the search direction dk is computed by

    dk = ukgk + vksk−1, (15)

    123

  • Journal of Optimization Theory and Applications

    where uk and vk are given by (9) and (10), respectively.

    Case IIIIf

    yTk−1yk−1sTk−1yk−1

    > ξ2 andsTk−1yk−1sTk−1sk−1

    ≥ ξ3√k, the condition number of the Hessian matrix

    is likely to be very large. In the case, it might be too simple to use the scalar matrix32‖yk−1‖2sTk−1yk−1

    I to estimate Bk in the term gTk Bkgk . When∣∣gTk sk−1gTk yk−1∣∣ is very small,

    we use

    Dk = I +yk−1yTk−1sTk−1yk−1

    to estimate Bk in the term gTk Bkgk , which suggests that

    ρk = ‖gk‖2 +(gTk yk−1

    )2sTk−1yk−1

    . (16)

    Substituting (16) into (6), we obtain

    ūk = −1 + gTk yk−1gTk sk−1

    sTk−1yk−1‖gk‖2(17)

    and

    v̄k =(1 − g

    Tk yk−1gTk sk−1

    sTk−1yk−1‖gk‖2)

    gTk yk−1sTk−1yk−1

    − gTk sk−1

    sTk−1yk−1. (18)

    Therefore, if the following conditions hold, i.e.,

    yTk−1yk−1sTk−1yk−1

    > ξ2,sTk−1yk−1sTk−1sk−1

    ≥ ξ3√k,

    ∣∣sTk−1gk yTk−1gk ∣∣sTk−1yk−1‖gk‖2

    ≤ ξ1 and(gTk sk−1

    )2sTk−1yk−1‖gk‖2

    < ξ4, (19)

    where 0 < ξ1 ≤ 10−4, the search direction is computed by

    dk = ūkgk + v̄ksk−1, (20)

    where ūk and v̄k are given by (17) and (18), respectively.

    Similar toCase I, our intentionof replacing∣∣gTk sk−1gTk sk−1∣∣ ≤ ξ1 by

    ∣∣sTk−1gk yTk−1gk ∣∣sTk−1yk−1‖gk‖2

    ≤ξ1 in the inequality (19) is also to ensure that the search direction possesses someimportant properties described in Sect. 3.2.

    Case IVIf none of the conditions (12), (14) and (19) holds, namely, the conditions

    yTk−1yk−1sTk−1yk−1

    > ξ2,sTk−1yk−1sTk−1sk−1

    ≥ ξ3√k,

    ∣∣sTk−1gk yTk−1gk∣∣sTk−1yk−1‖gk‖2

    ≤ ξ1,

    123

  • Journal of Optimization Theory and Applications

    (gTk sk−1

    )2sTk−1yk−1‖gk‖2

    < ξ4 (21)

    or the conditionsTk−1yk−1sTk−1sk−1

    <ξ3√k

    (22)

    hold, the search direction dk is computed by

    dk = −gk . (23)

    It is noted that the search direction (23) is also generated by solving (6) with v = 0and Bk = I .

    In conclusion, the new search direction can be stated as

    dk =

    ⎧⎪⎪⎨⎪⎪⎩

    ûkgk + v̂ksk−1, if (12) holds,ukgk + vksk−1, if (14) holds,ūkgk + v̄ksk−1, if (19) holds,−gk, if (21) or (22) holds,

    (24)

    where ûk , v̂k , uk , vk , ūk and v̄k are given by (11), (9), (10), (17) and (18), respectively.

    Remark 3.1 The difference between the new search direction (24) and BBCG3 is thatthe new search direction dk is computed by (13), (15), (20) or (23), while the searchdirection of BBCG3 given by (15) is only a special case of the new search direction(24).

    Remark 3.2 If f is a quadratic function and the line search is exact, the search direction(20) reduces to the Hestenes-Stiefel CG direction.

    3.2 Some Important Properties of the New Search Direction

    We study some important properties of the new search direction (24).The following lemma indicates the new search direction (24) satisfies the sufficient

    descent condition.

    Lemma 3.1 Assume that f satisfies Assumption 2.1. Then, the new search direction(24) satisfies the sufficient descent condition

    gTk dk ≤ −c1‖gk‖2, (25)

    where

    c1 = 23ξ2

    . (26)

    Proof We divide the proof into four cases.

    123

  • Journal of Optimization Theory and Applications

    Case I dk = ûkgk + v̂ksk−1, where ûk and v̂k are given by (11). We obtain that

    gTk dk = −gTk sk−1sTk−1yk−1

    gTk sk−1 ≤ −ξ4‖gk‖2.

    Case II dk = ukgk+vksk−1, where uk and vk are given by (9) and (10), respectively.It follows from (3.31) and (3.32) of [6] that

    gTk dk ≤ −‖gk‖4

    ρk.

    Combining (7) with (14), we deduce that

    gTk dk ≤ −‖gk‖4

    ρk= −2

    3

    sTk−1yk−1yTk−1yk−1

    ‖gk‖2 ≤ − 23ξ2

    ‖gk‖2.

    Case III dk = ūkgk + v̄ksk−1, where ūk and v̄k are given by (17) and (18),respectively. From (16) and (8), we get that Δk = ‖gk‖2 sTk−1yk−1. By (3.31) of[6], 0 < ξ1 ≤ 10−4 and (19), we obtain that

    gTk dk = −‖gk‖4Δk

    ⎡⎣sTk−1yk−1 − 2g

    Tk yk−1gTk sk−1

    ‖gk‖2+ ρk

    (gTk sk−1‖gk‖2

    )2⎤⎦

    = −‖gk‖2⎡⎣1 − 2 gTk yk−1gTk sk−1

    sTk−1yk−1‖gk‖2+ ρk

    sTk−1yk−1

    (gTk sk−1‖gk‖2

    )2⎤⎦

    ≤ −(1 − 2 g

    Tk yk−1gTk sk−1

    sTk−1yk−1‖gk‖2)

    ‖gk‖2

    ≤ − (1 − 2ξ1) ‖gk‖2.

    Case IV dk = −gk . We can easily establish that gTk dk = −‖gk‖2 .Let c1 = min

    {23ξ2

    , 1 − 2ξ1, ξ4}. It is clear from 0 < ξ1 ≤ 10−4, 0.8 < ξ4 ≤ 1

    and ξ2 > 104 that c1 = 23ξ2 . Therefore, we obtain (25). The proof is completed.

    Lemma 3.2 Assume that f satisfies Assumption 2.1. Then, the new search direction(24) satisfies

    ‖dk‖2 ≤ (c2 + c3k) ‖gk‖2 , (27)where c2 and c3 are given by

    c2 = (1 + ξ1)2 and c3 = max{81

    ξ23,(1 + L + ξ1L) (3 + L + 2ξ1 + ξ1L)

    ξ23

    },

    (28)respectively.

    123

  • Journal of Optimization Theory and Applications

    Proof We divide the proof into four cases.Case I dk = ûkgk + v̂ksk−1, where ûk and v̂k are given by (11). We obtain that

    ‖dk‖2 =(gTk sk−1

    )2(sTk−1yk−1

    )2 ‖sk−1‖2 ≤ ‖sk−1‖4

    (sTk−1yk−1

    )2 ‖gk‖2 ≤ kξ23 ‖gk‖2.

    Case II dk = ukgk+vksk−1, where uk and vk are given by (9) and (10), respectively.It follows from (8) and

    (gTk yk−1

    )2 ≤ ‖yk−1‖2‖gk‖2 that

    Δk = ρksTk−1yk−1 −(gTk yk−1

    )2

    = 32yTk−1yk−1sTk−1yk−1

    ‖gk‖2sTk−1yk−1 −(gTk yk−1

    )2≥ 12‖yk−1‖2‖gk‖2.

    Therefore,

    ‖dk‖ =∥∥∥∥ 1Δk

    [(gTk yk−1gTk sk−1 − sTk−1yk−1‖gk‖2

    )gk+

    (gTk yk−1‖gk‖2 − ρk gTk sk−1

    )sk−1

    ]∥∥∥∥≤ 1

    Δk

    [∣∣∣gTk yk−1gTk sk−1 − sTk−1yk−1‖gk‖2∣∣∣ ‖gk‖ +

    ∣∣∣gTk yk−1‖gk‖2 − ρk gTk sk−1∣∣∣ ‖sk−1‖

    ]

    ≤ 1Δk

    [(‖yk−1‖ ‖gk‖2 + ‖yk−1‖ ‖gk‖2

    )‖sk−1‖ ‖gk‖

    +(‖gk‖ ‖yk−1‖ ‖gk‖2 + ρk ‖gk‖ ‖sk−1‖

    )‖sk−1‖

    ]

    = 1Δk

    (2 ‖sk−1‖ ‖yk−1‖ ‖gk‖3 + ‖yk−1‖ ‖sk−1‖ ‖gk‖3 + ρk ‖gk‖ ‖sk−1‖2

    )

    = 1Δk

    (3 ‖sk−1‖ ‖yk−1‖ + ρk‖sk−1‖2/‖gk‖2

    )‖gk‖3

    ≤ 2‖yk−1‖2(3 ‖sk−1‖ ‖yk−1‖ + 3

    2

    yTk−1yk−1sTk−1yk−1

    ‖gk‖2‖sk−1‖2/‖gk‖2)

    ‖gk‖

    =(6

    ‖sk−1‖‖yk−1‖ + 3

    ‖sk−1‖2sTk−1yk−1

    )‖gk‖

    ≤(6√k

    ξ3+ 3

    √k

    ξ3

    )‖gk‖

    = 9√k

    ξ3‖gk‖ ,

    which implies that ‖dk‖2 ≤ 81ξ23k‖gk‖2.

    123

  • Journal of Optimization Theory and Applications

    Case III dk = ūkgk + v̄ksk−1, where ūk and v̄k are given by (17) and (18), respec-tively. We obtain that

    ‖dk‖ ≤ |uk | ‖gk‖ + |vk | ‖sk−1‖ ≤ (1 + ξ1) ‖gk‖

    +[(1 + ξ1) L ‖gk‖ ‖sk−1‖

    sTk−1yk−1+ ‖gk‖ ‖sk−1‖

    sTk−1yk−1

    ]‖sk−1‖

    = (1 + ξ1) ‖gk‖ +[(1 + L + ξ1L) ‖sk−1‖

    2

    sTk−1yk−1

    ]‖gk‖

    ≤[(1 + ξ1) + (1 + L + ξ1L)

    √k

    ξ3

    ]‖gk‖ .

    Therefore,

    ‖dk‖2 ≤{

    (1 + ξ1)2+ [(1 + ξ1) L + 1]2

    ξ23k + 2 (1 + ξ1) (1 + L + ξ1L)

    √k

    ξ3

    }‖gk‖2

    ≤[(1 + ξ1)2 + (1 + L + ξ1L) (3 + L + 2ξ1 + ξ1L)

    ξ23k

    ]‖gk‖2.

    Case IV dk = −gk . We have that ‖dk‖2 = ‖gk‖2 .Let c2 = max

    {1, (1 + ξ2)2

    }and c3 = max

    {1ξ23

    , 81ξ23

    ,(1+L+ξ1L)(3+L+2ξ1+ξ1L)

    ξ23

    }.

    It follows from ξ2 > 104, 0 < ξ1 ≤ 10−4 and 0.8 < ξ4 ≤ 1 that c2 =(1 + ξ1)2 and c3 = max

    {81ξ23

    ,(1+L+ξ1L)(3+L+2ξ1+ξ1L)

    ξ23

    }. Therefore, we obtain (27).

    The proof is completed.

    3.3 The New Choice of Initial Stepsize and the GeneralizedWolfe Line Search

    It is universally acknowledged that the choice of initial stepsize is of great importancefor a optimization method. Unlike general quasi-Newton methods, it is challenging todetermine a suitable initial stepsize for a CG method. In this subsection, we design anew strategy for the choice of initial stepsize, and develop a generalized Wolfe linesearch.

    Denote

    φk (α) = f (xk + αdk) , α ≥ 0.

    Hager and Zhang [8] chose the initial stepsize in CG_DESCENT as follows:

    α0k ={arg min q

    (φk (0) , φk ′ (0) , φk (τ1αk−1)

    ), if φk (τ1αk−1) ≤ φk (0) ,

    τ2αk−1, otherwise,(29)

    123

  • Journal of Optimization Theory and Applications

    where τ1 > 0, τ2 > 0 andq(φk (0) , φk ′ (0) , φk (τ1αk−1)

    )is the interpolation function

    that matches φk (0) , φk ′ (0) and φk (τ1αk−1). Dai and Kou [7] determined the initialstepsize in CGOPT as follows:

    α0k ={

    α, if |φk (α) − φk (0)| / (ε1 + |φk (0)|) > ε2,argmin q

    (φk (0) , φk

    ′ (0) , φk (α)), otherwise,

    (30)where α = max {ε3αk−1,−2 | fk − fk−1| /gTk dk} , ε1 > 0, ε2 > 0 and ε3 > 0.

    Unlike these above two strategies, we develop a new strategy for the choice of initialstepsize.

    According to [28,31] ,

    μk =∣∣∣∣∣2(fk−1 − fk + gTk sk−1

    )sTk−1yk−1

    − 1∣∣∣∣∣ (31)

    is a quantity showing how f (x) is close to a quadratic on the line segment betweenxk−1 and xk . If the following condition [29,30] holds, i.e.,

    μk ≤ ξ5 or max {μk, μk−1} ≤ ξ6, (32)

    where 0 < ξ5 < ξ6, f might be very close to a quadratic on the line segment betweenxk−1 and xk . It is well known that the linear CG method with the exact line searchenjoys the quadratic termination for strictly convex quadratic functions. In addition,Andrei [32] thought that the higher the accuracy of the stepsize is, the faster theconvergence rate of a CG method is. Based on the above observations, if f is closeto a quadratic on the line segment between xk−1 and xk , it is reasonable to choosethe minimizer of the interpolation function q

    (φk (0) , φk ′ (0) , φk (α)

    )as the initial

    stepsize for a CG method, where α > 0 is a trial stepsize.Case I The initial stepsize for the search direction (13), (15) or (20).In Newton-like methods, the choice of α0k = 1 is essential in giving rapid conver-

    gence rate [7]. Since the direction dk is generated by solving (6), it is reasonable tobelieve that α0k = 1 might be a good trial initial stepsize.

    Denoteᾱk = min q

    (φk (0) , φk

    ′ (0) , φk (1)). (33)

    If the condition (32) holds and ᾱk > 0, we set the initial stepsize as

    α̂k = min {max {ᾱk, λmin} , λmax} , (34)

    where λmax > λmin > 0.Therefore, if dk is computed by (13), (15) or (20), the initial stepsize is determined

    by

    α0k ={

    α̂k, if (32) holds and ᾱk > 0,1, otherwise.

    (35)

    123

  • Journal of Optimization Theory and Applications

    Remark 3.3 It is remarkable that if f is quadratic, λmin = 0 and λmax = +∞, then(32) always holds, which implies that the initial stepsize (35) is exact and thus a linesearch is not required to be evoked.

    Case II The initial stepsize for the negative gradient direction (23).It is recognized that the gradient method with the adaptive BB stepsize [33] is very

    efficient for strictly convex quadratic minimization, especially when the conditionnumber is large. So we also design a new adaptive BB stepsize for −gk . If the exactline search is adopted, then gTk sk−1 = 0, which implies that if gTk sk−1 > 0, the stepsizeαk−1 is larger than the exact stepsize αesk−1 = arg min

    α>0f (xk−1 + αdk−1). In order to

    compensate the gap, the initial stepsize α0k should be determined by the short BB

    stepsize αBB2k . Similarly, if gTk sk−1 ≤ 0, the initial stepsize α0k should be determined

    by the long BB stepsize αBB1k . Therefore, the initial trial stepsize is determined by

    αk =⎧⎨⎩{min

    {λkα

    BB2k , λmax

    }, λmin

    }, if gTk sk−1 > 0,{

    min{λkα

    BB1k , λmax

    }, λmin

    }, if gTk sk−1 ≤ 0,

    (36)

    where λk is a scaling parameter given by

    λk ={0.9, if n > 10 and Numgra > 5,1, otherwise,

    where Numgra denotes the number of the successive use of the negative gradientdirection.

    Denote ˜̃αk = min q (φk (0) , φk ′ (0) , φk (αk)) . (37)If dk−1 = −gk−1, ‖gk‖2 ≤ 1 and ˜̃αk > 0, the initial stepsize is determined by

    α̃k = min{max

    {˜̃αk, λmin} , λmax} . (38)Therefore, if dk = −gk , the initial stepsize is computed by

    α0k ={

    α̃k, if (32) holds, dk−1 = −gk−1, ‖gk‖2 ≤ 1 and ˜̃αk > 0,αk, otherwise.

    (39)

    As commented by Dai and Kou [6], it is important to design a suitable line searchwhen extendingBBCG3 to unconstrained optimization.We next develop a generalizedWolfe line search.

    Recall the standard Wolfe line search

    f (xk + αkdk) ≤ f (xk) + σαkgTk dk, (40)gTk+1dk ≥ δgTk dk, (41)

    123

  • Journal of Optimization Theory and Applications

    where 0 < σ < δ < 1. If f satisfies Assumption 2.1, for a descent direction dk ,there exist some stepsizes αk satisfying the standard Wolfe line search (40) and (41)theoretically. However, the standard Wolfe line search (40) and (41) might never besatisfied in practice due to the existence of the numerical errors [7]. To avoid thenumerical drawback, Dai and Kou [7] introduced an improved Wolfe line search inCGOPT:

    f (xk + αkdk) ≤ f (xk) + min{σαkg

    Tk dk + ηk, γ | f (xk)|

    }(42)

    and (41), where γ > 0, 0 < σ < δ < 1, ηk > 0 and+∞∑k=1

    ηk < +∞. Zhang andHager [34] presented an efficient nonmonotone Wolfe line search (Zhang–Hager linesearch):

    f (xk + αkdk) ≤ Ck + σαkgTk dk (43)and (41), where 0 < σ < δ < 1,

    C0 = f (x0) , Q0 = 1, Qk+1 = tk Qk + 1, Ck+1 = tk QkCk + fk+1Qk+1 , tk ∈ [ηmin, ηmax](44)

    and 0 ≤ ηmin ≤ ηmax ≤ 1. The parameter tk is often used to control the degree ofnonmonotonicity.

    Since the numerical errors occur and the BB method is incorporated into the newsearch direction (24), our intuition tells us that a nonmonotone line search is more suit-able. Motivated by Dai and Kou [7], Zhang and Hager [34], we design the generalizedWolfe line search:

    f (xk + αkdk) ≤ f (xk) + ηk + σαkgTk dk (45)

    and (41), where 0 < σ < δ < 1 and ηk ≥ 0 satisfies limk→+∞ kηk = 0. Here we take ηk

    as

    ηk ={0, if k = 0,min

    {1

    k lg(k/n+12) ,Ck − f (xk)}

    , if k ≥ 1, (46)

    where Ck is given by (44) and

    0 ≤ ηmin ≤ tk ≤ ηmax ≤ 1. (47)

    Clearly, (46) satisfies limk→+∞ kηk = 0. It is easy to see that the stepsize αk satisfying

    (45) and (41) also satisfies Zhang–Hager line search (43) and (41). In the generalizedWolf line search (45) and (41), the term “ 1k lg(k/n+12)” in (46) is used to generatenonmonotonicity and the term “Ck − f (xk)” in (46) is used to control the total degreeof nonmonotonicity, which can be guaranteed by (47). In addition, the reason forchoosing the term “Ck − f (xk)” is to obtain the R-linear convergence of the proposedmethod (See Theorem4.3). It is easy to see that if f satisfiesAssumption 2.1, there alsoexist some stepsizes αk satisfying (45) and (41) for a descent direction dk theoretically.

    123

  • Journal of Optimization Theory and Applications

    3.4 Description of the Barzilai–Borwein Conjugate Gradient Method

    Denote

    rk−1 = 2 ( fk − fk−1)gTk−1sk−1 + gTk sk−1

    . (48)

    According to [7], if rk−1 is close to 1, φk−1 (α) is close to a quadratic function. Similarto [7], if there are continuously many iterations such that |rk − 1| ≤ ξ7, where ξ7 > 0,we restart the proposed method. In addition, if the number of the successive use of CGdirection reaches to the threshold MaxRestart, we also restart the proposed method.We describe the Barzilai–Borwein conjugate gradient method in detail.

    Algorithm 1 (SMCG_BB)

    Step 0. Given x0 ∈ Rn, ε > 0, λmin, λmax, σ, δ, ξ1, ξ2, ξ3, ξ4, ξ5, ξ6, ξ7, α00,MaxRestart,MinQuad, Set IterRestart :=0, Numgrad :=0, IterQuad :=0,Numcongrad :=0 and k := 0.

    Step 1. If ||g0||∞ ≤ ε, stop.Step 2. Set d0 = −g0 and Numgrad :=1.Step 3. If k = 0, go to Step 4. If dk is determined by (23), compute α0k by (39),

    otherwise compute α0k by (35).Step 4. Determine αk satisfying the generalizedWolfe line search (45) and (41) with

    α0k .Step 5. Set xk+1 = xk + αkdk . If ||gk ||∞ ≤ ε, stop. Otherwise, set IterRestart

    :=IterRestart+1. If |rk − 1| ≤ ξ7 or∣∣ fk+1 − fk − 0.5 (gTk sk + gTk+1sk)∣∣ ≤

    ξ7/7, IterQuad := IterQuad+1, else IterQuad :=0.Step 6. Compute the search direction.

    Step 6.1 If the condition (21) or (22) holds, or Numcongrad=MaxRestar or(IterQuad=MinQuad and IterRestart = IterQuad), compute the searchdirection dk+1 by (23). Set Numgrad := Numgrad+1, Numcongrad := 0and IterRestart := 0, and go to Step 7.

    Step 6.2 If the conditions (12) hold, compute dk+1 by (13). Set Numgrad := 0 andNumcongrad := Numcongrad+1, and go to Step 7.

    Step 6.3 If the conditions (14) hold, compute dk+1 by (15). Set Numgrad := 0 andNumcongrad := Numcongrad+1, and go to Step 7.

    Step 6.4 If the conditions (19) hold, compute dk+1 by (20). Set Numgrad := 0 andNumcongrad := Numcongrad+1.

    Step 7. Compute ηk+1 by (46) and set k := k + 1, go to Step 3.

    4 Convergence Analysis

    Under Assumption 2.1, we prove that SMCG_BB is globally convergent, analyze theconvergence for convex functions and establish the R-linear convergence for uniformlyconvex functions.

    123

  • Journal of Optimization Theory and Applications

    Lemma 4.1 Assume that f satisfies Assumption 2.1. Then,

    αk ≥ (1 − δ) c1(c2 + c3k) L ,

    where c1, c2, c3 and δ are given by (26), (28) and (41), respectively.

    Proof By (41) and Assumption 2.1, we have that

    (δ − 1) gTk dk ≤ g (xk + αkdk)T dk − gTk dk = (g (xk + αkdk) − gk)T dk ≤ Lαk‖dk‖2,

    which shows that

    αk ≥ (δ − 1) gTk dk

    L‖dk‖2. (49)

    By (49), Lemmas 3.1 and 3.2, we obtain that

    αk ≥ (δ − 1) gTk dk

    L‖dk‖2≥ c1 (1 − δ)

    L

    ‖gk‖2‖dk‖2

    ≥ (1 − δ) c1L (c2 + c3k) . (50)

    The proof is completed.

    Theorem 4.1 Assume that f satisfies Assumption 2.1. Then,

    lim infk→+∞ ‖gk‖ = 0. (51)

    Proof By (41) and (50), we have that

    c21σ (1 − δ)2L max {c2, c3} k ‖gk‖

    2 ≤ c21σ (1 − δ)

    L (c2 + c3k)‖gk‖2 ≤ −c1σ (1 − δ) g

    Tk dk

    L (c2 + c3k) ≤ −σαkgTk dk,

    which together with (45) yields that

    c21σ (1 − δ)2L max {c2, c3} k ‖gk‖

    2 ≤ fk + ηk − fk+1. (52)

    Therefore, we have that k ( fk + ηk − fk+1) ≥ 0, which together with limk→+∞ kηk = 0

    means that

    lim infk→+∞ k ( fk − fk+1) ≥ lim infk→+∞ k ( fk + ηk − fk+1) + lim infk→+∞ −kηk = l ≥ 0.

    Suppose by contradiction that l > 0 , for ε = min{1, l2} > 0, ∃N > 0, ∀k > N , we

    have that

    fk+1 < fk − εk.

    123

  • Journal of Optimization Theory and Applications

    Combining the above inequality and+∞∑

    k=N+11k = +∞, we get that limk→+∞ fk = −∞,

    which is in contradiction with Assumption 2.1 (ii). Therefore,

    lim infk→+∞ k ( fk + ηk − fk+1) = l = 0,

    which together with (52) and limk→+∞ kηk = 0 implies (51). The proof is completed.

    Remark 4.1 ByLemma 3.2 and Theorem 4.1, we know that the convergence result (51)is established under the conditions ‖dk‖2 ≤ (c2 + c3k) ‖gk‖2 and ηmax ≤ 1, whichare different from the conditions ‖dk‖2 ≤ c2 ‖gk‖ and ηmax ≤ 1 used in Theorem 2.2in [34]. It is not difficult to verify that, under the conditions ‖dk‖2 ≤ (c2 + c3k) ‖gk‖2and ηmax ≤ 1, it fails to obtain (51) by the way similar to Theorem 2.2 in [34].Therefore, the proof of Theorem 4.1 is nontrivial.

    For convex functions, we have the following result.

    Theorem 4.2 Assume that f satisfies Assumption 2.1, the level set L (x0) ={x ∈ Rn : f (x) ≤ f (x0)} is bounded and f is convex on Rn. Then,

    limk→+∞ ‖gk‖ = 0. (53)

    Proof Denote K̄ = {k : fk+1 > fk}.Case I K̄ is a finite set. In this case, there exists an integer N0 > 0 such that

    { fk : k > N0} is monotonically decreasing, which together with Assumption 2.1 (i)yields that lim

    k→+∞ fk = f , where f = lim infk→∞ fk . By Theorem 4.1, we know thatthere exists a subsequence

    {gk j

    }such that lim

    j→+∞∥∥gk j∥∥ = 0. It follows from the

    boundedness of L (x0) that there exists a convergent subsequence of{xk j

    }. Without

    loss of generality, we assume xk j → x, j → +∞. Then, we have limj→+∞∥∥gk j∥∥ =

    ‖g (x̄)‖ = 0 and limj→+∞ fk j = f (x) = f . Since f is convex on R

    n , x̄ is a global

    minimizer and thus f̄ ≤ f (x) holds for all x ∈ Rn . Now, we prove that limk→+∞ ‖gk‖ =

    0.Assume that there exists a subsequence{xki}of {xk} such that lim

    i→+∞∥∥gki∥∥ = l > 0.

    It follows from the boundedness of L (x0) that there exists a convergent subsequenceof{xki}. Without loss of generality, we suppose xki → x, i → +∞. Then, we have

    limi→+∞

    ∥∥gki∥∥ = ∥∥g (x)∥∥ = l > 0 and limi→+∞ fki = f(x) = f . Consequently, x is

    a global minimizer and thus∥∥g (x)∥∥ = 0, which contradicts that ∥∥g (x)∥∥ = l > 0.

    Therefore, we obtain (53).Case II K̄ is a infinite set. We rewrite K̄ as K̄ = {k j | fk j+1 > fk j , j = 1, 2, . . .}.

    From (52), we know that

    c21σ (1 − δ)L max {c2, c3} k j

    ∥∥gk j∥∥2 ≤ fk j + ηk j − fk j+1 ≤ ηk j .

    123

  • Journal of Optimization Theory and Applications

    Combining the above inequality and limj→+∞ k jηk j = 0, we obtain that

    limj→+∞

    ∥∥gk j∥∥ = 0. (54)

    Since f is convex on Rn , 0 ≤ fk j − f ∗ ≤ gTk j(xk j − x∗

    ), where x∗ is a global

    optimal minimizer and f ∗ = f (x∗). It follows from the boundedness of L (x0) and(54) that

    limj→+∞ fk j = f

    ∗. (55)

    By f ∗ ≤ fk j + 1 ≤ fk j + ηk j + δαk j gTk j dk j ≤ fk j + ηk j and limj→+∞ k jηk j = 0, we getthat

    limj→+∞ fk j+1 = f

    ∗. (56)

    Denote k0 = −1. For any k ≥ 1, it is not difficult to see that there exists a positiveinteger k j such that k j−1 + 1 ≤ k ≤ k j and

    f ∗ ≤ fk ≤ max{fk j−1+1, fk j

    }. (57)

    It follows from (55), (56) and (57) that limk→+∞ fk = f

    ∗. Similar to Case I, we alsoobtain that (53). The proof is completed.

    Remark 4.2 The convergence result lim

    k→+∞ ‖gk‖ = 0 is established under the hypoth-esis of convex function, which is weaker than the hypothesis of uniformly convexfunction used in the convergence analysis for general CG methods.

    The following theorem indicates that SMCG_BB is R-linearly convergent for uni-formly convex functions.

    Theorem 4.3 Suppose that f is uniformly convex with uniqueminimizer x∗, ηmax < 1,the gradient g is Lipschitz continuous on bounded sets, and there existsμmax > 0 suchthat αk ≤ μmax for all k. Then, there exists θ ∈ ]0, 1[ such that

    fk − f(x∗) ≤ θk ( f0 − f (x∗)) .

    Proof By (45) and (46), we get that

    f (xk + αkdk) ≤ f (xk) + ηk + σαkgTk dk ≤ Ck + σαkgTk dk, (58)

    which shows that the stepsizes αk in SMCG_BB also satisfy Zhang–Hager line search(43) and (41). Since f is uniformly convex, there exists a scalar γ > 0 such thatf (x) ≥ f (y) + g(y)T (x − y) + 12γ ‖x − y‖2 , ∀x, y ∈ Rn , which immediatelysuggests that (g (x) − g (y))T (x − y) ≥ 1

    γ‖x − y‖2, ∀x, y ∈ Rn . For the above

    inequality, let x = xk and y = xk−1, we can easily obtain that

    123

  • Journal of Optimization Theory and Applications

    ‖sk−1‖2sTk−1yk−1

    ≤ γ, ‖sk−1‖4

    (sTk−1yk−1

    )2 ≤ γ 2 and ‖sk−1‖‖yk−1‖ ≤ γ,

    which together with proof of Lemma 3.2 means that there exists c > 0 such that

    ‖dk‖ ≤ c ‖gk‖ . (59)

    By (49), (59) and Lemma 3.1, we can obtain that

    αk ≥ (δ − 1) gTk dk

    L‖dk‖2≥ (1 − δ) c1‖gk‖

    2

    L‖dk‖2≥ (1 − δ) c1

    Lc2Δ= β̄. (60)

    According to (43), (44) and Lemma 3.1, we know that Ck+1 = tk QkCk+ fk+1Qk+1 = Ck +fk+1−CkQk+1 ≤ Ck . If there exists a positive integer k0 such that

    ∥∥gk0∥∥ = 0, we can easilyshow that there exists 0 < θ < 1 such that Ck+1 − f (x∗) ≤ θ (Ck − f (x∗)), whichyields that fk − f (x∗) ≤ θk ( f0 − f (x∗)). In what follows, we only consider thecase of ‖gk‖ = 0, ∀k ≥ 0. In this case, we have that Ck > Ck+1 > f (x∗), whichmeans that

    0 <Ck+1 − f (x∗)Ck − f (x∗) < 1, ∀k ≥ 0. (61)

    Denote r = lim supk→∞

    Ck+1− f (x∗)Ck− f (x∗) . Clearly, 0 ≤ r ≤ 1.

    First of all, we consider the case of r = 1. It follows from r = 1 that there exists asubsequence

    {xk j

    }such that

    limj→∞

    Ck j+1 − f (x∗)Ck j − f (x∗)

    = 1. (62)

    Since ηmax < 1, by (2.15) of [34] we deduce that 0 < 1 − ηmax ≤ 1Qk ≤ 1,where Qk is given by (44). Therefore, there exists a subsequence of

    {xk j

    }such that

    the corresponding subsequence of

    {1

    Qk j+1

    }is convergent. Without loss of generality,

    we assume that

    limj→∞

    1

    Qk j+1= r1. (63)

    Clearly,0 < 1 − ηmax ≤ r1 ≤ 1. (64)

    Since

    Ck j+1 − f (x∗)Ck j − f (x∗)

    =(1 − 1

    Qk j+1

    )+ 1

    Qk j+1fk j+1 − f (x∗)Ck j − f (x∗)

    ,

    123

  • Journal of Optimization Theory and Applications

    it follows from (62), (63) and (64) that

    limj→∞

    fk j+1 − f (x∗)Ck j − f (x∗)

    = 1. (65)

    According to (58), Lemma 3.1 and (60), we have that fk j+1 − f (x∗) ≤ Ck j −f (x∗) − c1σ β̄

    ∥∥gk j∥∥2. Dividing the above inequality by Ck j − f (x∗), and from (65)and f (x∗) < Ck j , we obtain that

    limj→∞

    ∥∥gk j∥∥2Ck j − f (x∗)

    = 0. (66)

    Since f is uniformly convex, we know from (3.4) of [34] that

    fk j+1 − f(x∗) ≤ γ ∥∥gk j+1∥∥2. (67)

    It follows from the Lipschitz continuity of g, (59) and αk ≤ μmax that∥∥gk j+1∥∥ ≤ ∥∥gk j+1 − gk j∥∥+ ∥∥gk j∥∥

    ≤ L ∥∥xk j+1 − xk j∥∥+ ∥∥gk j∥∥= Lαk j

    ∥∥dk j∥∥+ ∥∥gk j∥∥= (1 + μmaxLc)

    ∥∥gk j∥∥ ,which together with (67) yields that

    fk j+1 − f(x∗) ≤ γ (1 + μmaxLc)2 ∥∥gk j∥∥2.

    Dividing the above inequality by Ck j − f (x∗), and from (66), we obtain thatlimj→∞

    fk j+1− f (x∗)Ck j − f (x∗) = 0, which contradicts with (65). Therefore, the case of r = 1

    does not occur, that is,

    lim supk→∞

    Ck+1 − f (x∗)Ck − f (x∗) = r < 1. (68)

    It follows from (68) that there exists a integer T > 0 such that

    Ck+1 − f (x∗)Ck − f (x∗) ≤ r +

    1 − r2

    = 12

    + r2

    < 1 (69)

    holds for all k > T . From (61), we know that 0 < max0≤k≤T

    {Ck+1− f (x∗)Ck− f (x∗)

    }= r1 < 1.

    Let θ = max { 12 + r2 , r1}. Clearly, 0 < θ < 1. It follows from (69) thatCk+1 − f

    (x∗) ≤ θ (Ck − f (x∗)) , (70)

    123

  • Journal of Optimization Theory and Applications

    which indicates that

    Ck+1 − f(x∗) ≤ θ (Ck − f (x∗)) ≤ θk+1 (C0 − f (x∗)). (71)

    By (44) and (58), we obtain that Ck+1 = tk QkCk+ fk+1Qk+1 ≥ fk+1, which together with(71) yields that fk+1 − f (x∗) ≤ Ck+1 − f (x∗) ≤ θk+1 (C0 − f (x∗)). Therefore,fk − f (x∗) ≤ θk ( f0 − f (x∗)). The proof is completed.

    5 Numerical Experiments

    We compare the performance of SMCG_BBwith that of CGOPT and CG_DESCENT(5.3). The C codes of CG_DESCENT (5.3) and CGOPT can be downloaded fromhttp://users.clas.ufl.edu/hager/papers/Software andhttp://coa.amss.ac.cn/wordpress/?page_id=21, respectively.

    In the numerical experiments, we choose the following parameters for SMCG_BB:ε = 10−6, σ = 0.01, λmin = 10−30, λmax = 1030, tk = 0.9999, δ = 0.9999, ξ1 =10−4, ξ2 = 106, ξ3 = 10−8, ξ4 = 0.875, ξ5 = 10−5, ξ6 = 0.06, ξ7 = 7 × 10−8,MinQuad=3, MaxRestar= 4n and

    α00 =

    ⎧⎪⎪⎨⎪⎪⎩

    1, if | f | ≤ 10−30 and ‖x0‖∞ ≤ 10−30,2 | f | / ‖g0‖ , if | f | > 10−30 and ‖x0‖∞ ≤ 10−30,min

    {1, ‖x0‖∞/‖g0‖∞

    }if ‖g0‖∞ < 107 and ‖x0‖∞ > 10−30,

    min{1,max {1, ‖x0‖∞} /‖g0‖∞

    }, if ‖g0‖∞ ≥ 107 and ‖x0‖∞ > 10−30.

    CG_DESCENT (5.3) and CGOPT both use all default parameter values expect thestopping conditions. All test methods are terminated if ‖ gk‖∞ ≤ 10−6 is satisfiedor the number of iterations exceeds 200,000. SMCG_BB with the generalized Wolfeline search (45) and (41) is written in C code. The C code of SMCG_BB and thedetailed numerical results can be found from the website: http://web.xidian.edu.cn/xdliuhongwei/en/paper.html.

    The performance profiles introduced by Dolan and Moré [35] are used to displaythe performances of the test methods. The numerical experiments are divided intothree groups. The first group is doing in a PC with 3.60 GHz CPU processor (Intel(R)Xeon(R) CPU E5-1650), 64 GB RAM memory and Windows 7, and the other twogroups are doing in Ubuntu 10.04 LTS which is fixed in a VMware Workstation 10.0installed inWindows 7. In Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12, “Niter”, “Nf ”, “Ng”and “Tcpu” represent the number of iterations, the number of function evaluations, thenumber of gradient evaluations and CPU time (s), respectively.

    In the first group of the numerical experiments, we compare SMCG_BB withCGOPT and CG_DESCENT (5.3) for 80pro_Andrei, and the dimension of each prob-lem is set to 10000. SMCG_BB successfully solves 80 problems, while CGOPT andCG_DESCENT (5.3) successfully solve 79 and 77 problems, respectively. In Fig. 1,we observe that SMCG_BB performs better than CGOPT relative to the number ofiterations, and is at a disadvantage only for the case of τ < 1.5 in contrast with

    123

    http://users.clas.ufl.edu/hager/papers/Softwarehttp://coa.amss.ac.cn/wordpress/?page_id=21http://coa.amss.ac.cn/wordpress/?page_id=21http://web.xidian.edu.cn/xdliuhongwei/en/paper.htmlhttp://web.xidian.edu.cn/xdliuhongwei/en/paper.html

  • Journal of Optimization Theory and Applications

    1 2 3 4 5 6 70.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMGG_BBCGOPTGG_DESCENT(5.3)

    Fig. 1 Performance profile based on Niter (80pro_Andrei)

    1 2 3 4 5 6 70.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMGG_BBCGOPTGG_DESCENT(5.3)

    Fig. 2 Performance profile based on Nf (80pro_Andrei)

    1 2 3 4 5 6 70.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMGG_BBCGOPTGG_DESCENT(5.3)

    Fig. 3 Performance profile based on Ng (80pro_Andrei)

    123

  • Journal of Optimization Theory and Applications

    1 2 3 4 5 6 70.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMGG_BBCGOPTGG_DESCENT(5.3)

    Fig. 4 Performance profile based on Tcpu (80pro _Andrei)

    1 2 3 4 5 6 70.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ )

    SMCG_BBGGOPT

    Fig. 5 Performance profile based on Niter (CUTEr)

    CG_DESCENT (5.3). In Fig. 2, we see that SMCG_BB outperforms CG_DESCENT(5.3) and has a significant improvement overCGOPT relative to the number of functionevaluations. Figure 3 indicates that SMCG_BB has a significant improvement overCG_DESCENT (5.3) and CGOPT relative to the number of gradient evaluations. Wesee from Fig. 4 that SMCG_BB is much faster than CG_DESCENT (5.3) and CGOPT,and CG_DESCENT (5.3) is a little faster than CGOPT. It indicates that SMCG_BBis superior to CG_DESCENT (5.3) and CGOPT for 80pro_Andrei.

    In the second group of the numerical experiments, we compare SMCG_BB withCGOPT for the 145 test problems in the CUTEr library [9]. The names and dimen-sions of the 145 test problems are the same as that of the numerical results in [36].SMCG_BB successfully solves 140 problems, and CGOPT successfully solves 134problems, which is 6 problems less than SMCG_BB. As shown in Figs. 5, 6, 7 and 8,we can easily observe that SMCG_BB is superior to CGOPT for the 145 test problemsin the CUTEr library.

    123

  • Journal of Optimization Theory and Applications

    1 2 3 4 5 6 7

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMCG_BBGGOPT

    Fig. 6 Performance profile based on Nf (CUTEr)

    1 2 3 4 5 6 70.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ )

    SMCG_BBGGOPT

    Fig. 7 Performance profile based on Ng (CUTEr)

    1 2 3 4 5 6 70.5

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

    τ

    P(τ )

    SMCG_BBGGOPT

    Fig. 8 Performance profile based on Tcpu (CUTEr)

    123

  • Journal of Optimization Theory and Applications

    1 2 3 4 5 6 7

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMCG_BBGG_DESCENT(5.3)

    Fig. 9 Performance profile based on Nf (CUTEr)

    In the third group of the numerical experiments, we compare SMCG_BB withCG_DESCENT (5.3) for the 145 test problems in the CUTEr library [9]. The test prob-lems are the same as that in the second group of the numerical experiments. SMCG_BBsuccessfully solves 140 problems, while CG_DESCENT (5.3) successfully 144 prob-lems. As shown in Fig. 9, we observe that SMCG_BB is at a disadvantage relative tothe number of function evaluations. However, we see from Fig. 10 that SMCG_BB isbetter than CG_DESCENT (5.3) relative to the number of gradient evaluations. Fig-ure 11 indicates that SMCG_BB outperforms slightly CG_DESCENT (5.3) relativeto Nf + 3Ng. In Fig. 12, we see that SMCG_BB is a little faster than CG_DESCENT(5.3). It follows from the results in Sect. 4 that SMCG_BBwith the generalizedWolfeline search (45) and (41) is globally convergent, whereas there is no guarantee for theglobal convergence of CG_DESCENT with the very efficient approximate Wolfe line

    1 2 3 4 5 6 70.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ )

    SMCG_BBGG_DESCENT(5.3)

    Fig. 10 Performance profile based on Ng (CUTEr)

    123

  • Journal of Optimization Theory and Applications

    1 2 3 4 5 6 70.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    τ

    P(τ)

    SMCG_BBGG_DESCENT(5.3)

    Fig. 11 Performance profile based on Nf + 3Ng (CUTEr)

    1 2 3 4 5 6 70.5

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

    τ

    P(τ )

    SMCG_BBGG_DESCENT(5.3)

    Fig. 12 Performance profile based on Tcpu (CUTEr)

    search [37]. It indicates that SMCG_BB is competitive to CG_DESCENT (5.3) forthe 145 test problems in the CUTEr library.

    6 Conclusions

    In this paper, we present an efficient Barzilai–Borwein conjugate gradient methodfor unconstrained optimization problems. Using the idea of the BB method and someproperties of the linearCGmethod,wederive the newsearchdirection basedon (6), anddesign a new strategy for the choice of initial stepsize. We also develop a generalizedWolfe line search, which is nonmonotone and can avoid a numerical drawback ofthe original Wolfe line search. Numerical results show that SMCG_BB is superior toCGOPT for 80pro_Andrei and the CUTEr library, and SMCG_BB is also superior to

    123

  • Journal of Optimization Theory and Applications

    CG_DESCENT (5.3) for 80pro_Andrei and is competitive to CG_DESCENT (5.3)for the CUTEr library.

    Future work is to extend SMCG_BB to non-smooth optimization.

    Acknowledgements We would like to thank the anonymous referees and the editor for their valuablecomments. We also would like to thank Professor Dai, Y. H., Dr. Kou caixia and Dr. Chen weikun fortheir help in the numerical experiments, and thank Professors Hager, W.W. and Zhang, H. C. for theirC code of CG_DESCENT (5.3). This research is supported by National Science Foundation of China(No. 11461021), Guangxi Science Foundation (Nos. 2015GXNSFAA139011, 2017GXNSFBA198031),Shaanxi Science Foundation (No. 2017JM1014), Scientific Research Project of Hezhou University (Nos.2014YBZK06, 2016HZXYSX03), Guangxi Colleges and Universities Key Laboratory of Symbolic Com-putation and Engineering Data Processing.

    References

    1. Yuan, Y.X.: A review on subspace methods for nonlinear optimization. In: Proceedings of the Interna-tional Congress of Mathematics 2014, Seoul, Korea, pp. 807–827 (2014)

    2. Yuan, Y.X., Stoer, J.: A subspace study on conjugate gradient algorithms. Z. Angew. Math. Mech.75(1), 69–77 (1995)

    3. Andrei, N.: An accelerated subspace minimization three-term conjugate gradient algorithm for uncon-strained optimization. Numer. Algorithms 65(4), 859–874 (2014)

    4. Yang, Y.T., Chen, Y.T., Lu, Y.L.: A subspace conjugate gradient algorithm for large-scale unconstrainedoptimization. Numer. Algorithms 76(3), 813–828 (2017)

    5. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148(1988)

    6. Dai, Y.H., Kou, C.X.: A Barzilai–Borwein conjugate gradient method. Sci. China. Math. 59(8), 1511–1524 (2016)

    7. Dai, Y.H., Kou, C.X.: A nonlinear conjugate gradient algorithm with an optimal property and animproved Wolfe line search. SIAM J. Optim. 23(1), 296–320 (2013)

    8. Hager, W.W., Zhang, H.C.: A new conjugate gradient method with guaranteed descent and an efficientline search. SIAM J. Optim. 16(1), 170–192 (2005)

    9. Gould, N.I.M., Orban, D., Toint, PhL: CUTEr and SifDec: a constrained and unconstrained testingenvironment, revisited. ACM Trans. Math. Softw. 29(4), 373–394 (2003)

    10. Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10, 147–161(2008)

    11. Fletcher, R., Reeves, C.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154(1964)

    12. Hestenes, M.R., Stiefel, E.L.: Methods of conjugate gradients for solving linear systems. J. Res. Natl.Bur. Stand. 49(6), 409–436 (1952)

    13. Polyak, B.T.: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys.9, 94–112 (1969)

    14. Dai, Y.H., Yuan, Y.X.: A nonlinear conjugate gradient method with a strong global convergence prop-erty. SIAM J. Optim. 10(1), 177–182 (1999)

    15. Dai, Y.H., Yuan, Y.X.: Nonlinear Conjugate Gradient Methods. Shanghai Scientific and TechnicalPublishers, Shanghai (2000)

    16. Hager, W.W., Zhang, H.C.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1),35–58 (2006)

    17. Dong, X.L., Liu, H.W., He, Y.B.: A self-adjusting conjugate gradient method with sufficient descentcondition and conjugacy condition. J. Optim. Theory Appl. 165(1), 225–241 (2015)

    18. Zhang, L., Zhou, W.J., Li, D.H.: Global convergence of a modified Fletcher–Reeves conjugate methodwith Armijo-type line search. Numer. Math. 104(4), 561–572 (2006)

    19. Dong, X.L., Liu, H.W., He, Y.B.: A modified Hestenes–Stiefel conjugate gradient method with suffi-cient descent condition and conjugacy condition. J. Comput. Appl. Math. 281, 239–249 (2015)

    20. Babaie-Kafaki, S., Reza, G.: The Dai–Liao nonlinear conjugate gradient method with optimal param-eter choices. Eur. J. Oper. Res. 234(3), 625–630 (2014)

    123

  • Journal of Optimization Theory and Applications

    21. Andrei,N.:Another conjugate gradient algorithmwith guaranteeddescent and the conjugacy conditionsfor large-scaled unconstrained optimization. J. Optim. Theory Appl. 159(3), 159–182 (2013)

    22. Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer.Anal. 13(3), 321–326 (1993)

    23. Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J.Numer. Anal. 22(1), 1–10 (2002)

    24. Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimizationproblem. SIAM J. Optim. 7(1), 26–33 (1997)

    25. Liu, Z.X., Liu, H.W., Dong, X.L.: An efficient gradient method with approximate optimal stepsize forthe strictly convex quadratic minimization problem. Optimization 67(3), 427–440 (2018)

    26. Biglari, F., Solimanpur, M.: Scaling on the spectral gradient method. J. Optim. Theory Appl. 158(2),626–635 (2013)

    27. Liu, Z.X., Liu, H.W., Dong, X.L.: A new adaptive Barzilai and Borwein method for unconstrainedoptimization. Optim. Lett. 12(4), 845–873 (2018)

    28. Dai, Y.H., Yuan, J.Y., Yuan, Y.X.: Modified two-point stepsize gradient methods for unconstrainedoptimization problems. Comput. Optim. Appl. 22(1), 103–109 (2002)

    29. Liu, Z.X., Liu, H.W.: An efficient gradient method with approximate optimal stepsize for large-scaleunconstrained optimization. Numer. Algorithms 78(1), 21–39 (2018)

    30. Liu, Z.X., Liu, H.W.: Several efficient gradient methods with approximate optimal stepsizes for largescale unconstrained optimization. J. Comput. Appl. Math. 328, 400–413 (2018)

    31. Yuan, Y.X.: A modified BFGS algorithm for unconstrained optimization. IMA J. Numer. Anal. 11(3),325–332 (1991)

    32. Andrei, N.: Open problems in nonlinear conjugate gradient algorithms for unconstrained optimization.Bull. Malays. Math. Sci. Soc. 34(2), 319–330 (2011)

    33. Zhou, B., Gao, L., Dai, Y.H.: Gradient methods with adaptive stepsizes. Comput. Optim. Appl. 35(1),69–86 (2006)

    34. Zhang, H.C., Hager, W.W.: A nonmonotone line search technique and its application to unconstrainedoptimization. SIAM J. Optim. 14(4), 1043–105 (2004)

    35. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Pro-gram. 91(2), 201–213 (2002)

    36. Hager, W.W., Zhang, H.C.: The limited memory conjugate gradient method. SIAM J. Optim. 23(4),2150–2168 (2013)

    37. Hager, W.W., Zhang, H.C.: Algorithm 851:CG_DESCENT, a conjugate gradient method with guaran-teed descent. ACM Trans. Math. Softw. 32(1), 113–137 (2006)

    123

    An Efficient Barzilai–Borwein Conjugate Gradient Method for Unconstrained OptimizationAbstract1 Introduction2 Preliminaries2.1 The Barzilai–Borwein Conjugate Gradient Method for Strictly Convex Quadratic Minimization

    3 The Barzilai–Borwein Conjugate Gradient Method for Unconstrained Optimization3.1 Derivation of the New Search Direction3.2 Some Important Properties of the New Search Direction3.3 The New Choice of Initial Stepsize and the Generalized Wolfe Line Search3.4 Description of the Barzilai–Borwein Conjugate Gradient Method

    4 Convergence Analysis5 Numerical Experiments6 ConclusionsAcknowledgementsReferences