Path-Following Methods for Linear Programming … · SIAM REVIEW ( 1992 Society for Industrial and...

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend access to SIAM Review.

http://www.jstor.org

Path-Following Methods for Linear Programming Author(s): Clovis C. Gonzaga Source: SIAM Review, Vol. 34, No. 2 (Jun., 1992), pp. 167-224Published by: Society for Industrial and Applied MathematicsStable URL: http://www.jstor.org/stable/2132853Accessed: 21-09-2015 16:22 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTCAll use subject to JSTOR Terms and Conditions

http://www.jstor.org

http://www.jstor.org/action/showPublisher?publisherCode=siam

http://www.jstor.org/stable/2132853

http://www.jstor.org/page/info/about/policies/terms.jsp



SIAM REVIEW ( 1992 Society for Industrial and Applied Mathematics Vol. 34, No. 2, pp. 167-224, June 1992 001

PATH-FOLLOWING METHODS FOR LINEAR PROGRAMMING* CLOVIS C. GONZAGAt

Abstract. In this paper a unified treatment of algorithms is described for linear programming methods based on the central path. This path is a curve along which the cost decreases, and that stays always far from the boundary of the feasible set. Several parameterizations of this curve are described in primal and primal- dual problems, and it is shown how different algorithms are obtained by following the curve using different parameterizations. Polynomial algorithms are obtained by following the curve approximately, and this concept becomes precise by using explicit rules for measuring the proximity of a point in relation to the central path.

Key words. linear programming, interior point methods, path-following algorithms

AMS(MOS) subject classification. 49D

1. Introduction. In this paper we study a family of algorithms for solving the linear programming problem

minimize cTx

(P) subject to Ax = b

x > 0,

where c E El, b E E', A is a full-rank m x n matrix, n > m. We assume that the feasible region

S = {X E In I Ax = b,x > O}

is bounded and has a nonempty relative interior given by

So = {X E 1Wn I Ax = b,x > O}.

The linear programming problem was first solved by Dantzig [14] forty years ago. The simplex method developed by him is still the most widely used algorithm, and it will possibly remain so in the future. Although the simplex method is efficient and elegant, it does not possess a property that became more and more charming in the last two decades: polynomial complexity. In fact, a problem devised by Klee and Minty [60] forced the simplex method to execute a number of arithmetical operations that grew exponentially with the number of variables of the problem, attaching to the method an exponential worst-case complexity.

The question on whether a polynomial algorithm for the linear programming problem exists was answered in 1978 by Khachiyan [58], [59]. He applied the ellipsoidal method of Shor [102] and Yudin and Nemirovskii [123] to the linear programming problem and proved a polynomial bound on the number of arithmetical operations needed to find an optimal solution. The bound, O(n4L), depends on a number L, the length of the input (total number of bits used in the description of the problem data), which is some- what frustrating. The existence of a "strongly polynomial" algorithm, i.e., a method with a complexity bound based only on the number of variables and constraints, is still a difficult open problem. The method raised an enormous enthusiasm, and had a great impact

*Received by the editors September 4, 1990; accepted for publication (in revised form) May 10, 1991. tCOPPE, Federal University of Rio de Janeiro, C. Postal 68511, 21945 Rio de Janeiro, RJ, Brazil (email:

gonzagaQbrlncc . bitnet).

167 This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTCAll use subject to JSTOR Terms and Conditions


168 CLOVIS C. GONZAGA

on the theory of complexity, but unfortunately the practical implementations have been irremediably inefficient.

For comprehensive studies of these two approaches, see for instance Dantzig [15], Schrijver [100], and Goldfarb and Todd [32]. Complexity issues are discussed in Shamir [101], Megiddo [70], [71], Bland, Golfarb, and Todd [12], Borgwardt [13], and Tardos [107].

In 1984, Karmarkar [55] published his algorithm, which not only had a polynomial complexity bound of O(n3-5L) operations, lower than Khachian's, but was announced as more efficient than the simplex method. There was initially much discussion about this claim, but now it is clear that well-coded versions of the new methodology are very efficient, especially when the problem size increases above some thousands of variables. Karmarkar's algorithm is essentially different from the simplex method in that it evolves through the (relative) interior of the feasible set, instead of following a sequence of ver- tices as does the simplex method. Karmarkar's algorithm has a flavor of nonlinear programming, in contrast with the combinatorial gait of the simplex method.

Karmarkar's algorithm in its original form needed a special formulation of the linear programming problem, and relied on the knowledge of the value of an optimal solution, or a process for generating efficient lower bounds for it. Soon standard-form variants were developed by Anstreicher [4], Gay [24], Gonzaga [34], Steger [106], and Ye and Kojima [121], and an efficient method for generating lower bounds for the optimal cost was devised by Todd and Burrell [108]. Another approach for finding lower bounds was developed by Anstreicher [3].

A thorough simplification of Karmarkar's algorithm reproduces the algorithm due to Dikin [16], [17], which now received the name of "affine-scaling." This method will be briefly discussed in ?3.2. Karmarkar's algorithm, its variants and implementations, are discussed in Goldfarb and Todd [32]. We describe a variant of Karmarkar's algorithm in ?3.6.

Our concern starts from the fact that Karmarkar's algorithm performs well by avoiding the boundary of the feasible set. And it does this with the help of a classical resource, first used in optimization by Frisch [22] in 1955: the logarithmic barrier function:

n

X E ff?n X > O ~-p(X) = Elog xi.

i=l

This function grows indefinitely near the boundary of the feasible set S, and can be used as a penalty attached to those points. Combining p(.) with the objective makes points near the boundary expensive, and forces any minimization algorithm to avoid them.

A question is then naturally raised: how far from the boundary should one stay? A successful answer was given through the definition of the analytic center of a polytope by Sonnevend [103], the unique point that minimizes the barrier function. A well-behaved curve is formed by the analytic centers of all the constant-cost slices of the feasible set in (P): the central path. This is the subject of this paper. This path is a region with some very attractive primal-dual properties, and provides an answer to our question: try to stay near the central path. Renegar did so in 1986 [96], and obtained the first path- following algorithm, with a complexity lower than that of Karmarkar's method in terms of number of iterations (O(V/iL) against O(nL)). Renegar's approach was based on Huard's method of centers [50].

Soon afterwards Vaidya [111], refining Renegar's results, and Gonzaga [36], following a penalty function approach, described algorithms with an overall complexity of O(n3L) arithmetical operations, limit that is still standing. Simultaneously, Kojima,



PATH-FOLLOWING METHODS FOR LINEAR PROGRAMMING 169

Mizuno, and Yoshise [65] developed a primal-dual path-following method, which was soon to be reduced to that low complexity by the same authors [64] and by Monteiro and Adler [89]. A fourth approach based on Karmarkar's potential function appeared later, first in Ye [118] and then in Freund [19], and in Gonzaga [41].

Only proven complexity results were cited in the brief historical account above. An amazing fact has been found out by Anstreicher [6]: the classical barrier function method (SUMT) developed for nonlinear programming by Fiacco and McCormick [18], exactly as implemented in 1968, solves linear and quadratic programs in O(V/niL log L) iterations.

The field of interior point methods has been extremely active in the last few years. Over a hundred papers were written, developing the four approaches for linear programming, extending them to convex quadratic programming, to linear complementarity problems and to convex nonlinear programming. Path-following methods, which started as short-steps algorithms with nice theoretical properties, evolved into practical large- steps methods.

The purpose of this paper is to describe a unified treatment of central path algorithms, and to show how one arrives naturally at the four approaches commented above (methods of centers, penalty function methods, potential reduction methods, and primal-dual methods). We shall see that the good properties of points near the central path are intimately associated to nice primal-dual properties at these points, and this will provide the unifying concepts for the whole theory. And not surprisingly, we shall in the end be able to abandon the central path and work directly with primal-dual properties, while keeping all the nice properties of path-following methods.

Proofs will be given only for some results. We hope to achieve the goal of providing a complete treatment of one approach (penalty function methods), and to pave the way for straightforward analyses of the other ones. We shall restrain ourselves to linear programming, and we do not intend to make a survey of the field in this paper: it should rather be considered as a tutorial on the basic techniques.

Organization of the paper. The next section is an informal overview of the geometrical aspects of the methods. Section 3 describes the linear programming problem and the main tools used in interior point methods, including a variant of Karmarkar's algorithm with the Todd-Burrell lower bound updating procedure. Section 4 describes the central path and conceptual path-following algorithms, which assume that exact points on the path are computed by an oracle. The treatment stresses the similarities among several approaches and ends by a complexity theorem for conceptual algorithms. Sections 5 and 6 discuss nearly central points and centralization algorithms, allowing the construction of computationally implementable algorithms, in which only points near the central path are allowed. The specialization of these algorithms to the various approaches (penalty function methods, methods of centers, potential reduction methods, and primal-dual methods) is described in detail in ??7, 8, and 9. Section 10 has references to topics not covered in this paper and to extensions of the approach to nonlinear problems.

Notation. We shall work with column vectors and matrices denoted, respectively, by lower case and upper case letters. Different vectors will be denoted by superindices; subindices will denote components of a vector. These are some special conventions: For a vector like x, xk, z, the corresponding upper case symbols X, Xk, Z will denote the diagonal matrix formed by the vector's components. Given a vector x E ERn the notation x- will be used for the vector with components x21, i = 1,***, n.

The letter e will denote the vector of ones, e = [1 ... j]T, with dimension indicated by the context.




For future reference, here is a listing of the main symbols used in the text:

e = [1 ... 1]T X = diag(xi, **x,,n) 1R+, 1R++: nonnegative and positive vectors in Rfn.

x, w, z, A, b, c: variables and data for (P) and (D) (?3.1). y, w, z, A, b, c: variables and data for scaled problems (?3.2). x, w, z: optimal solutions for (P) and (D) (?3.1). v = c x: optimal value (?3.1). S, SO: feasible set for (P) and its relative interior (?3.1). Z, ZO: set of feasible dual slacks and its relative interior (?3.1). PM, PM: projection matrix into Af(M) and its orthogonal complement (?3.1). rp = Pr: projection of r into subspace in the context (?3.1). p(.): barrier function (?3.3). p X x(p): generic parameterization of the central path (?4.2). p z(p): generic parameterization of the dual central path (?4.2). p - v(p): dual cost (lower bound) associated to p (?4.2). a: penalty multiplier (18). K: upper bound for the cost (19). v: lower bound for the cost (20). fo(), fK( ), fv(.): penalized, center and potential functions (?3.4). q: fixed multiplier, q > n (?3.4). X: analytic center of S (?3.3). h(x, p) = X- Ih(x, p): SSD direction from e in scaled space (22). h(x, p): SSD direction for fp(-) from x (23). 6(x, p) = IIh(x, p)II: proximity measure from x to x(p) (40). v(p), z(p), Ai(p): lower bound, dual slacks, and gap associated to central points (?4.1). v(x, p), z(x, p), A (x, p): lower bound, dual slacks, and gap associated to nearly central points (42). v5(x), z4x), A(x): best guesses for lower bound, dual slacks, and gap at x (?3.5).

2. An overview of central path methods. This section discusses very informally geometrical aspects of the central path. Precise definitions, properties, and references will be given later.

Let us start by observing the barrier function

n n

x E JRn, X > 0 F-4p(x) =-E logxi =-log J xi. i=1 i=1

This function penalizes variables that approach zero, and hence penalizes points near the boundary of S. The unique minimizer of p(.) in SO is the analytic center of S, and it coincides with the point that maximizes the product of the variables in S. Figure 2.1 illustrates the center for a simple problem.

Much of the paper will be dedicated to showing that Newton-Raphson's method can be adapted to the determination of a center with a prescribed precision with excellent theoretical and practical performance. For the time being, let us assume that an exact solution is at hand.

The main idea behind all interior point methods is that costs should try to be de- creased and simultaneously move away from the boundary. As is natural to do in the




FIG. 2.1. Level curves and values for the product of variables in S.

face of competing objectives, we shall examine combinations of cost and barrier function, in a traditional construction known as internal penalized function:

a E R, x EFH fa (x) = ac Tx +P(X).

This function was extensively studied by Fiacco and McCormick in their book [18], and described since then in all nonlinear programming textbooks.

Now associate to each value of the parameter a a centralpoint x (a) uniquely defined by

(1) x((a) = argminfa (x). xESO

The curve a e 1f? F-* x (a) is the central path for the linear programming problem (P). The curve is smooth and has the important property that as a increases it converges to an optimal solution of (P).

There are several different descriptions of the central path as we shall see below. Each description corresponds to a different parameterization of the curve. One of them has a simple geometrical interpretation: consider a central point

x(a) = argmin{accTx + p(x)}. xESO

This point obviously solves the problem obtained by constraining the cost to its value at x(a), that is,

x(a) = argmin{p(x) I cTx = cTx(a)}. xESO

This problem describes the analytic center of a constant-cost slice of the original feasible set S, and this is illustrated by Fig. 2.2.

Path-following algorithms follow the central path. Let p e (w, w+) F-* x(p) be a parameterization of the central path, with w+ > w-, w+ possibly infinite. All algorithms follow the model below.

ALGORITHM 2.1. Conceptual path-following: given xo e so, po e (w-, w+), with x? = x(po).




FIG. 2.2. The central points are the analytic centers of the constant-cost slices of S.

k := 0.

REPEAT Choose Pk+1 > Pk Call an internal minimization algorithm to find xk+1 := x(pk+1). k := -k + 1.

UNTIL convergence.

As it is, the model simply generates a sequence of independent central points. Actual algorithms will depend on the parameterization, the initialization (choice of po and x?), and what is more important, the criterion for updating the parameter.

The parameterization and the updating rule characterize different algorithms. As an example, the penalty function approach uses the parameterization in (1) and updates the parameter by ak+1 := (1 + p)ak, where p is a positive constant.

The internal algorithm is essentially the same for all methods. Finding a central point is a minimization problem with a nonlinear objective function with a simple Hes- sian matrix, and a natural choice is the algorithm of Newton-Raphson. We shall use in all cases an algorithm to be discussed in the next section, called scaling-steepest descent (SSD). This method is in some cases exactly equivalent to Newton-Raphson. The crucial point to be discussed in relation to the internal algorithm is its stopping rule.

It is impossible to find a central point exactly in finite time, and we want to construct polynomial algorithms. Also from the practical point of view the internal algorithm should be terminated as soon as possible. We must then renounce to the determination of central points, and work "near" the central path. Precise criteria must be defined for considering a point "near" a central point, and this will be the object of ?5.

Assuming that we do have a good criterion for measuring proximity, our model can be improved a little, as follows. Figure 2.3 illustrates the behavior of algorithms in this model.




7~~~~~ /~~~

/ 2

Ix I ./; : / /

FIG. 2.3. Short and large step path-foUlowing algorithms.

ALGORITHM 2.2. Implementable path-following: given xo e So, po e (w, w+), with x? near x(po).

k := 0.

REPEAT Choose Pk+1 > Pk Call an internal minimization algorithm to find xk+l near X(pk+l). The algorithm starts at xk. k := k + 1.

UNTIL convergence.

Figure 2.3 shows two possible combinations of parameter updates and proximity criteria. In the first case, a short-step update forces the algorithm to trace the path, so that all points generated are near the path. In this case, the internal algorithm usually executes exactly one iteration per iteration of the main algorithm. In the second case, the parameter is updated by large steps, and several iterations of the internal algorithm are needed to approach the central point corresponding to Pk+1*

3. Tools and non-path-following methods. This section establishes the main facts and definitions that compose the language of interior point methods in general.

3.1. The linear programming problem. The linear programming problem (P) was already stated in ?1. We chose the format with equality constraints and nonnegative variables, but the equivalent format with inequality constraints and unrestricted variables could have been chosen as well. In fact, there are simple rules for "translating" results from one formulation into the other (see Gonzaga [35]). The extension to inequality constraints of the barrier function and of the notion of analytic center is straightforward by using slacks.

We assume as above that the constraint matrix A is full-rank, and that the feasible set S is bounded with nonempty relative interior SO. These assumptions can be relaxed: we only need a bounded optimal set for most results, but this generalization affects the simplicity of the results.

We shall also assume that an initial interior point xo E SO is at hand. This assumption is in practice replaced by an initialization procedure that modifies the problem. A




typical procedure is a big-M method, like the one discussed in Adler, Karmarkar, Re- sende, and Veiga [2].

With these hypotheses, the problem has an optimal solution x (not necessarily unique), and the optimal value of the problem will be denoted by

(2) v= x.

Figure 3.1 illustrates problem (P). The figure shows the projection of c into Af(A), the null space of A. Projected vectors will play an important role in interior point methods, and we shall take a little space to review the concept of projection.

cPl

\ if/

X AX = b /

FIG. 3.1. The linear programming problem.

Two subspaces of 1RI are associated to the linear transformation represented by A: the null space K(A) = {X E 1Rn I Ax = O}, and its orthogonal complement, the range space of AT, defined by IZ(AT) = {x eE 1n I x = ATW, W E jRm}. Any vector d E 1Rn can be uniquely decomposed as d = dP + dP, where dP E Af(A) and iP E IZ(AT). dp and dP are, respectively, the projection of d into K(A) and its orthogonal complement.

Since the projection operator is linear, it can be represented by a matrix PA, such that dP = PAd. The orthogonal complement will be dP = PAd, where PA = I - PA.

If A is a full-rank matrix, then there is a closed formula for the projection matrix:

(3) PA = I - AT(AAT)-1A.

The projection of d into K(A) is the point in K(A) with smallest Euclidean distance to d. This is actually the most usual definition of projection:

(4) dP = argmin{lx - dll I x E AS(A)}.

Similar statements are associated to the orthogonal complement:

(5) dP = argmin{lizil I z = d - ATw,w E Rim} z

(6) ildpli = min{ ld - ATwII I w E lm}.




The optimal set for a problem (P) does not change if we replace the objective by cp'x. The importance of the projected cost cp should be clear: it provides the steepest ascent direction for the cost from an interior point. Given any differentiable function f: SO -+ JR and an interior point x, the steepest descent direction for f(.) from x is -PAVf(x).

Remark on notation. Given any matrix M, the projection matrix into Af(M) will be denoted by PM. Whenever no confusion is possible, we use the simplified notation P for the projection matrix, and then the projection of a vector r will be denoted by rp _ Pr -PMr.

Dual problem. The dual problem associated to (P) is

maximize bTw

(D) subject to ATw + z = c

z > 0.

The variables z E DRn are called dual slacks. Under our hypotheses, (D) has an optimal solution (wz, z) (not necessarily unique), and bTw V v.

The duality gap. Given any pair (x, z), where x E S and (w, z) is feasible for (D) for some w E #R',

xTz = cTx-bTW.

This is a well-known fact, which can be proved by direct substitution of z = c - ATw. Note that optimality is equivalent to xTz = 0: this is the theorem of complementary slacks.

The dual problem seems to have too many variables. In fact, the variables w can be eliminated, leading to a very convenient symmetrical primal-dual pair. This has been thoroughly studied by Todd and Ye [110], and we use here a very simple reduction procedure.

LEMMA 3.1. z E 1n is a feasible dual slackfor (D) if and only if z > 0 and PAz = PAC Proof. Consider a vector z > 0. Then z is a feasible dual slack if and only if for some

w E JRm,

c - z = ATW.

But c - z can be decomposed in a unique way as

c-Z = PA(c-z) +?ATW,

and it follows from the comparison of the two expressions above that PA(c- z) = PAC- PAZ = 0, completing the proof. [1

This lemma is very interesting: it provides a simple rule for testing the feasibility of a dual slack. Some conclusions can be obtained now.

Given any point x E S, an equivalent dual problem (in the sense that the objective differs by a constant at all feasible points) can be written as

minimize xT z

(7) subject to PAZ = PAC

z > 0.




Here the objective is the duality gap, and the optimal value is z = cTI - v. Similarly, the primal problem can also be modified: its objective can be replaced

by cTx as we saw above, or by zTx for any feasible dual slack z. The equivalent primal problem will be:

minimize ;Tx

(8) subject to Ax = b

x > 0,

Notation. The dual feasible set for (7) and its relative interior will be defined as

Z = {ZEJR|PAZ=PAC,Z>?O},

Z?= {ZE1Rn|PAZ=PAC,z>O}.

3.2. The scaling-steepest descent algorithm. A scaling transformation on problem (P) is a change of variables x = Dy, where D is a positive diagonal matrix. Given a point x? E S, scaling about x? is the scaling transformation x = Xoy, where according to our notational convention, Xo = diag(x?, , xO). The linear programming problem scaled about x? will be

minimize ET y

(SP) subject to Ay = b

y > 0,

where A = AXO, c = Xoc are obtained by substitution of x := Xoy in (P). The point x? is transported to e, the vector of ones.

Scaling affects dual variables in a simple way. LEMMA 3.2. (w, z) is a feasible dual solution for (P) if and only if (w, Xoz) is a feasible

dual solution for (SP). Proof. (w, z) is feasible for (P) if and only if

ATw+z=c z > O.

Multiplying by the positive diagonal matrix Xo,

(AXo)Tw + XOz = XOC, Xoz > 0.

This characterizes a dual feasible pair (w, Xoz) for (SP), completing the proof. O Remark on notation. The primal variables in scaled problems will be either y or x.

All other entities associated to the scaled problem will be indicated by a bar. There are several reasons why scaling is very useful. It obviously does not change

problem (P), and so it is in principle innocuous. The first reason why we shall always work with scaled problems is that it simplifies the expressions in most of the theory and procedures to be studied, yielding very clear formulas.

The second reason is that scaling does affect the steepest descent direction. And it does so in a clever way, as we show now.

Consider a generalization of problem (P) for a differentiable objective f(I):

minimize f (x),




and a point xo E SO. The steepest descent direction was studied by Cauchy around 1840: it is the direction that solves the minimization of the linear approximation of f (.) about x? over a unit ball centered in x?,

(9) minimize{Vf(xO)Td I Ildil < 6, d E Af(A)}.

The optimal solution stems, as a consequence of the Cauchy-Schwarz inequality, and is always a multiple of

(10) h=-PAVf(x0).

The steepest descent direction may be very inefficient in constrained problems, as we illustrate in Fig. 3.2 for a very simple problem, with S = 1R2. The steepest descent computation is actually what is known today as a trust-region minimization: a simple objective (linear approximation) is minimized in a simple region (a ball) to obtain a hint on the behavior of the function around the point. A ball is an obvious choice for trust region because it is easy (all one needs is a projection) and democratic (no directions are favored).

The presence of positivity constraints spoils the second advantage, and motivates the search for an easy region capable of reflecting the shape of the region of interest more precisely. The easiest large shape available is the largest possible simple ellipsoid in the positive orthant, shown in Fig. 3.2. The ellipsoid, with axes parallel to the coordinate axes (and hence simple), provides a large trust region when intersected with S.

FIG. 3.2. Trust region minimization in Cauchy and SSD algorithms.

Scaling the problem about x? deforms this ellipsoid into a ball centered at e, and hence the solution of the trust region minimization is obtained by scaling followed by the projection of the resulting gradient vector (see Fig. 3.3).

This analysis results in a general first-order interior trust region minimization algorithm that can in principle be used for any continuously differentiable objective function. This algorithm will be called scaling-steepest descent (SSD), and will be the minimization method used in most of this paper (primal-dual algorithms will use a slightly different scaling).

ALGORITHM 3.3. (SSD): given xo E SO, f: SO -> JR continuously differentiable.

k := 0.

REPEAT Scaling: A := AXk, g xkVf(xk).




FIG. 3.3. Affine-scaling trust regions.

Direction: h :=-PA9- _

Line search: y := e + Ah, y > 0. Scaling: xk+l Xky. k := k+ 1.

UNTIL convergence.

The scaling transformation transports xk to the vector e. The direction h minimizes the linear approximation of y - f(y) = f(X-1y) in a ball (corresponding to the largest simple ellipsoid in the original space). The line search along h is not specified here: it is usually an approximate minimization of f(.) along h, perhaps with a heuristic procedure to avoid the boundary (not needed if the barrier function is present).

An amazingly efficient algorithm for linear programming is obtained by the direct application of SSD to the original problem (P). This is the method known as affine- scaling, first proposed by Dikin in 1967 [16]. Dikin took always a step of length one in the line search, i.e., A = 1/1lhll. Other researchers used large steps, a fixed percentage (above 95 percent) of the maximum possible steplength in the positive orthant. Like in any interior point algorithm, the computational work is concentrated in the projection operation needed in each iteration.

The affine-scaling algorithm is naturally obtained as a simplified variant of Kar- markar's algorithm, and was rediscovered along this path by several authors: Barnes [9], Vanderbei, Meketon, and Freedman [115]. The algorithm is globally convergent for problems with no primal degeneration, as Dikin proved in 1974 [17] for unit steplengths. His proof was improved and clarified by Vanderbei and Lagarias [114], and extended to large steps by Gonzaga [39]. The method has been successfully implemented by many groups, like Adler, Karmarkar, Resende, and Veiga [2] and Monma and Morton [84]. See Goldfarb and Todd [32] for a discussion of implementations.

The search direction. We wrote the SSD algorithm using an explicit scaling operation at each iteration. This is not needed, since scaling was only a method for the




trust region minimization; the search direction can be explicitly expressed in the original space, and it is easy to see that it is given by

(11) h = Xkh = -XkPAXk Xkvf (x)

The SSD algorithm can be written directly in the original space. ALGORITHM 3.4. (SSD): given xo E S0, f: S0 -* R continuously differentiable.

k := 0.

REPEAT Direction: h == -XkPAXk XkVf (Xk k). Line search: xk+l := xk + Ah, xk+l > 0. k := k+ 1.

UNTIL convergence. Since the use of these explicit expressions tends to produce hard-to-read mathematics, we shall frequently do scaling and work in the transformed space.

The affine-scaling iterations, for an example, are illustrated in Fig. 3.3. The ellipsoidal trust regions are shown in original space: they correspond at each point to the intersection of S and the largest simple ellipsoid in BR+ centered at the point. Here we took unit steplengths.

3.3. The barrier function and the analytic center. The affine-scaling algorithm needs interior points to generate nice ellipsoidal trust regions, and obtains this feature by re- stricting the steplength. A step of unit length is inefficient, and a fixed percentage of the maximum step is far from elegant. Although it avoids the boundary, the points may accumulate near it. There are good reasons to believe that the resulting algorithm is not polynomial (see Megiddo and Shub [73]).

An elegant way of actively avoiding the boundary will be obtained by defining a center for the polytope S, and by the simultaneous consideration of two objectives: reducing costs and centering. We now forget the cost for a while and turn to the problem of finding a "center" for the polytope S.

The best possible definition of center is probably the center of gravity, but its computation is known to be very difficult, more difficult than the linear programming problem itself. Another nice center is the center of an ellipsoid of maximum volume inscribed in S. Although this ellipsoid has been computed in polynomial time by Khachiyan and Todd [57], it is still too difficult. These are geometrical centers. A third good center has been defined by Vaidya [113], and can also be computed in polynomial time for a given precision: the volumetric center is like before the center of a maximum volume ellipsoid in S, but the maximum is taken among the simple ellipsoids (intersections of S and ellipsoids with axes parallel to the coordinate axes). For the time being it is also too difficult for practical methods.

At this time, the most useful center is the analytic center, defined by Sonnevend [103]: The analytic center of S is the unique point given by

(12) X = argminp(x). xESO

Approaching the analytic center (centering) depends on a good understanding of the barrier function. The notational conventions explained in the introduction will be used in the study of its properties. The barrier function p: ffn -* fR is defined by




n

p(x) = -Elog xi, i=l

and has derivatives

(13) Vp(x) = -x-l, Vp(e) =-e, (14) V2p(x) = X`2 V2p(e) = I.

Convexity. Since V2p(x) is positive definite in SO, p(.) is strictly convex. Besides this, p(x) grows indefinitely as x approaches the boundary of S, and thus the analytic center is well defined.

Effect of scaling. Consider a positive diagonal matrix D. We have n

p(Dx) = p(x) - log di. i=l

Given two points x1, x2 > 0,

p(Dx2) - p(Dxl) = p(X2) -p(Xl)

and hence scaling operations do not affect variations of p(.). Here we see another reason for the use of scaling: while not affecting variations of

the barrier function, scaling yields extremely easy derivatives. Still more striking is the fact that at e the Hessian matrix is the identity, with the consequence that the steepest descent direction from e coincides with the Newton-Raphson direction. Hence, the scaling-steepest descent algorithm and Newton-Raphson's method with line searches coincide for the barrier function.

Linear approximations arounde. In our study of the efficiency of algorithms we shall use linear approximations of the barrier function. At this point we establish a bound on the error of the linear approximation around e.

We begin by listing some results on the logarithm function around 1. LEMMA 3.5. Let A E (-1, 1) be given. Then

A2 1 (15) A > log(l + A) > A - -

Proof. The first inequality is a direct consequence of the concavity of the logarithm. The second inequality was proved by Karmarkar [55], by developing the logarithm in Taylor's series:

log(l+A) = A--2 +3 4 2 3 4

A2

> A--l2(1?+IAI+A12+...) 2 A 2 1

and the proof is complete. O




Variation of the barrier function around e. LEMMA 3.6. Consider a vector d E 1fn such that IIdlIk0 < 1. Then

(16) p(e + d) > Vp(e)'d = -e'd.

(17) p(e + d) < Vp(e)'d + 1I1d12 1 -e'd + 1Id12 1 2 1 - lldlIk0 2 1 - lldlli

Proof. We have

n

p(e + d) = l-,1og(1 + di). i=l

Since di e (-1,1) by hypothesis, it is enough to extend the properties (15). The extension is straightforward by adding the inequalities for i = 1 to n. 0

Centering. Since the SSD algorithm coincides with Newton-Raphson's method with line searches for the minimization of the barrier function, it is natural to conclude that either method must be efficient for the determination of the analytic center. The resulting algorithm is indeed efficient both in theory and in practice: it is the only method used in the literature. Its complexity was studied by Vaidya [112], and will be revised in ?6.

3.4. Auxiliary functions. Given a point in SO, our task is obtaining a better point with respect to two goals: cost improvement and centering. As it is natural when two objectives are present, we take combinations of them.

Following this reasoning, different auxiliary functions are constructed, each one leading to a different family of algorithms. Each auxiliary function uses a parameter that weights in some way the importance given to each of the two objectives. Each auxiliary function will be associated to one parameterization of the central path, as we shall see in detail in the next section. Here we simply list the functions used in primal methods (primal-dual methods will be examined separately).

(i) The penalized function (Frisch [22], Fiacco and McCormick [181)-parameter a associated to a duality gap:

(18) x E So f(x) = acTx + p(x).

(ii) The center function (Huard [50], Renegar [96])-parameter K, upper bound to the optimal cost; q > n constant:

(19) x E S0 s.t. cT x < K fK (x) = -qlog(K-c Tx) + p(x).

(iii) The potential function (Karmarkar [55])-parameter v, lower bound for the optimal cost; q > n constant:

(20) x E S? fv(x) = qlog(cTX - v) + p(x).

In the notation used for these functions we sacrificed formal precision for simplicity: the actual function is singled out by the symbol used for the parameter. We shall dedicate much effort to each of these functions ahead. At this point we want to make some comments on their similarities.




All auxiliary functions have as second term the barrier function, responsible for avoiding the boundary. The first term involves the cost, and the parameter weights both terms. In (i), increasing a increases the importance of the cost term; in (ii), decreasing K increases - log(K - cTx); in (iii) the same effect is obtained by increasing v.

Still more interesting is a comparison of the gradients of these functions at e (e will result from a scaling operation in algorithms), respectively,

_ - e, q - q aec-e, K_-ceCCe - Te-v

The steepest descent directions at e are all combinations of two vectors: -PA C and PAe, respectively, called cost-reduction direction and centering direction. This is actually true for most existent interior point methods: the search directions used by the algorithms are combinations of the cost-reduction and centering directions (for scaled problems).

Another interesting conclusion is that given K or v, it is straightforward to find a value of a such that VfQ (e) coincides with, respectively, VfK (e) or Vfv (e):

(21) a = q K a= c v

Descent directions for the auxiliary functions. Most internal algorithms apply the SSD algorithm to the auxiliary functions with a fixed value of the parameter. Let us introduce some notation for these descent directions.

Given a point x e SO, the descent directions in the original space will be denoted h(x, p), where p is a parameter among a, K, v. The corresponding directions in scaled space will be denoted h(x, p). We have

(22) h(x, p) = -PAXXVfp(x),

(23) h(x, p) = Xh(x, p).

Now let us discuss the relationship between the SSD direction h(x, p) and the Newton- Raphson step (NR step). The main result is for the penalized function: the directions coincide.

LEMMA 3.7. Consider the function fa(.) for a fixed a > 0. The NR step from x coincides with the SSD direction, given by

(i) h(e, a) =-PAVfca(e) = -acp + ep. (ii) h(x, a) =-XPAXXVfa (x) = -XPA (a - e)a Proof. Assume initially that x = e. Then the quadratic approximation of f]. (.) about

e has derivatives given by

VfcQ (e + h) VfcQ (e) + V2fcQ(e)h = VfQ,f(e) + Ih,

since V2fQ, (e) = V2p(e) = I. The NR step corresponds to the minimizer of the quadratic approximation, obtained by setting the projected gradient to zero:

PVfQ (e) + h(e, a) = 0,

completing the proof of (i). To prove (ii), note that the NR algorithm is scale invariant. In fact, the quadratic

approximation of a function does not depend on the metric of the space (in contrast




with the norm-dependent steepest descent direction). We conclude that for an arbitrary x E So, the NR step in scaled space will be computed as in (i) in that space,

h(x, a) =-PAXVf,Q(x),

completing the proof. [ For the other auxiliary functions, the Hessian matrix is influenced by the first term,

and SSD is no longer equivalent to NR. But it can be interpreted as a quasi-Newton method in the following sense.

A quasi-Newton method minimizes at each iteration a quadratic model of the function, which may differ from the Taylor expansion:

fp(x + h) fp(x) + Vfp(x)Th + 2hTEh.

The SSD algorithm uses E = V2p(x) instead of E = V2fp(x). We lose the contribution of the second derivatives of the first term of the functions. This contribution is null in the penalized function, and the methods coincide. For the potential function, the first term contributes a negative definite rank-one matrix that may destroy the positive definiteness of the Hessian matrix, and thus is ignored. For the center function, the equivalence can be reestablished by a problem transformation to be described in ?8.

3.5. Guessing a dual slack and a lower bound. The parameter used by the potential function f (*) must be a lower bound to the value v of an optimal solution. We now describe a procedure for guessing a feasible dual slack, and consequently a lower bound for v. This procedure was presented in [40], and gives the same bounds as the methods developed by Todd and Burrell [108], and by de Ghellinck and Vial [25] using projective geometry. The dual slacks generated by it have the same format as the ones used in all existent primal potential reduction methods.

Given any feasible point x E S, the procedure associates to it a lower bound v(x) > -oo. If v(x) = -oo, then the procedure fails, and no dual slack is generated. The usefulness of the procedure will be ensured by the fact (to be seen in ?5.2) that a good lower bound will always be generated at points on or near the central path.

Suppose initially that e E S. From Lemma 3.1, we deduce that a vector z E 1RW is a feasible dual slack if and only

if z > 0 and z = cp + -y, where -y I K(A). Our guess consists in trying to find a "very positive" vector -y I Kr(A) and adding it to cp to obtain a nonnegative vector. The ideal guess would be -y proportional to e, but in general e is not orthogonal to Kr(A). We try -y proportional to e - ep.

Let us define the vector a I Kr(A) given by

(24) ti e ep>

If for some it E 1, cp - ,ii > 0, then z = cp - ,ii is a feasible dual slack. The duality gap associated to the primal-dual pair (e, z) will be

A=eTZ = cT e- p fL

since aTe = 1 as is easy to see because (e - ep)T(e - ep) = (e - ep)Te.

Now v = cTe - A is a lower bound for v. The best lower bound will correspond to the maximum admissible value for /A.




We can now formalize the procedure, associating to each x E S a lower bound v(x) E [-oo, v] obtained by the procedure above after scaling the problem about x:

(25) A(x) = inf{eTPA- - fLI PAC - ,ii > O},

iV(X) = C X-A(X).

with the convention that inf 0 = +oo. If A(x) < +oo, then the procedure defines the dual feasible slack

(26) Z(x) = X-1(PAE - Aia),

where ft is the minimizer in (25).

3.6. Non-path-following variants of Karmarkar's algorithm. Karmarkar's original algorithm [55] is based on the potential function with q = n. It assumes that the optimal cost value v is known, and uses this parameter value from the beginning. Since the optimal value is seldom available, Karmarkar proposed the use of lower bounds to v, and an updating procedure. Updating procedures were soon improved in the references cited in ?3.5.

Karmarkar's algorithm is not simply the SSD algorithm applied to the potential function. For completeness, we now present a very brief description of its mechanics.

First, assume that the primal problem (P) is stated in the format

minimize c Tx

subject to A'x = 0

aTx = 1

x > 0.

Obtaining this format from (P) is straightforward with the introduction of an extra variable. Let q = n in the potential function (20) and assume initially that v = 0.

The resulting potential function is fo(x) = n log cTx + p(x). It is zero-degree ho- mogeneous, i.e, for any x > 0, A > 0, fo(Ax) = fo(x). This means that given any point x > 0 such that A'x = 0 and aTx > 0 but aTx $ 1, the point x/aTx is feasible and has the same potential value (since aT(x/aTx) = 1). Thus the following scheme can be used:

(i) Drop the constraint aTx = 1; (ii) Use SSD to find xk such that fo(Xk) << 0; (iii) Compute x = xk/aTxk. The resulting point x has a very small potential value, and this can only be true for

points such that cTx is near zero, since the barrier term is bounded below in a bounded region. This is one of the variants of Karmarkar's algorithm, and is equivalent to his original algorithm (see [34]).

This algorithm relies on the assumption that v = 0. If v is known but not equal to zero, the algorithm can still be used with the potential function fo (x) = n log(cTx -v) + p(x). The homogeneity is restored by noting that for any feasible point x, v = viaTx, and thus on the feasible set

fv (x) = n log(c - va)Tx + p(x).




If v is unknown, we must generate lower bounds vk < v and use them in the expression above. The lower bounds will be generated by the procedure (25).

We now state the resulting algorithm. It is important to remark that two different projections will be used: the computation of a lower bound by (25) uses projections into the original space Af(A), where A is formed by adding the row aT to A'. The algorithm drops the constraint aTx = 1, and hence uses projections into Kr(A'). The computations are easy, since these null spaces are simply related (see [40]).

ALGORITHM 3.8. (Karmarkar [55], Todd and Burrell [108], Gonzaga [40].) Given xo E S?, vO < v.

k := 0.

REPEAT Scaling: A' := A'Xk, c:= Xkc, a:= Xka.

Lower bound: vk+1 := max{vk,i3(xk)}. Direction: h -PA,( cTx (C- Vk+l)- e).

Line search: y e + Ah, Y > 0. Back to the feasible set: y* : jv.

Scaling: x+1 Xky*. k:= k + 1.

UNTIL convergence.

For some time it was believed that this geometry, based on the homogeneity of the potential function, was essential for the polynomial behavior of the algorithm, but this is not the case. If the potential function uses q > n + \/;i, with q = O(n), then the SSD algorithm can be applied directly in the original formulation (P), preceding each iteration by a lower bound computation as in the algorithm above. The resulting complexity is the same as for Karmarkar's algorithm.

This polynomial affine algorithm was first described by Gonzaga [41] with very rudi- mentary lower bound updates. Ye [118] and Freund [19] propose updating schemes for v when v is unknown, but these algorithms are already related to the central path, and will be discussed later.

Both Karmarkar's Algorithm 3.8 and the affine polynomial method are driven by the following fact: at all iterations, IIhII > 1. This means that the directional derivative of fV,+1 along h is quite negative, and a substantial decrease can be ensured in all iterations. This leads to the polynomial bound obtained by Karmarkar. Although the complexity analysis of these algorithms is beyond the scope of this paper, very similar proofs will be made ahead for path-following methods.

4. The central path. This section describes the central path, its properties, and gives a complete treatment of conceptual primal path-following algorithms, including their computational complexity.

The central path was first studied by Bayer and Lagarias [10], and its primal-dual properties were described by Megiddo [72].

Consider the penalized function (18). DEFINMON 4.1. The central point x (a) associated to ae ff? is defined as

(27) x(a) = argmin f,Q (x) xESO




Central points are well defined for any a e ff?, since f", (.) is always strictly convex and grows indefinitely near the boundary of S, with S bounded by hypothesis. It follows that x (a) is the unique point such that PVfQ, (x) = 0, or

(28) acp - P-1 = 0.

DEFINITION 4.2. The central path is the curve a e fR+ - x(a) E So Note that we chose to define the curve only for a > 0. This is due to the other

parameterizations to be seen.

4.1. Primal-dual properties. The next lemma associates to each central point x(a) a feasible dual slack z (a) and consequently a duality gap A (a). Having access to a good duality gap is a great blessing for mathematical programmers. It provides reliable stopping rules and, more than this, provides the driving force for the algorithms, which will try to reduce the gap at each iteration.

LEMMA 4.3. Let x(a) be the centralpoint associated to a > 0. Then

(29) z(a) = (x(a))1 a

is a feasible dual slack, and the duality gap associated to x((a) and z (a) is

(30) x(a)Tz(a) n a

Proof. From (28),

Px(a) -1 = acp,

or

px(a)- 1 P _ a P

Since x (a) > 0, x(a) /a satisfies the necessary and sufficient conditions to be a feasible dual slack by Lemma 3.1. The duality gap is

1 i n _x(av)x(av)-1=a! a a

completing the proof. [

4.2. Parameterizations of the central path. Each of the auxiliary functions defined in 3.4 generates a parameterization p |-- x(p) of the central path. We shall describe them, resorting to Table 1, which lists the main features of each parameterization. Before addressing each parameterization, let us summarize the results in the table.

The first row describes each auxiliary function at a point x in its domain. The second row describes their gradients at x. Note that if x is a central point, then

PAVfp(x) = 0, and, consequently (equating the projected gradients),

q q K - cTX

a CTXT-xv

The third row uses this relation and Lemma 4.3 to associate a duality gap A(p) to each of the parameterizations: simply substitute a in A (a) = n/a.




TABLE 1

Parameterizations of the central path.

PARAMETERIZATION

Primal Primal-dual Dual

fp(x) -qlog(K - cTx) + p(x) ac TX + p(x) q log(cTx -v) + p(x)

q l K1 q Vf (x) K-cT x CT-1 C X

n _ CTX) ~ n n CX_V Gap A(p) - (K-cTx) n n(cTx_v) q aq

Update p K'- cTX :=3(K - cTX) a (CTX - v') = 3(cTx- v) (reduce gap) /3 (0,1) 3 3 E n 1

Update K' =/3K + (1-3)cTx v' =/3v + (1-3)cTx

K

/3 K'

.1- 3 cTx(K) cTx(V)

v(K) 3

v

The fourth row is the most interesting of all. Given the central point x(p), the conceptual path-following Algorithm 2.1 asks for a parameter update to take a step along the central path. In all methods the update will be interpreted as "try to reduce the duality gap by a ratio fi E (0, 1)." This row proposes precisely this: it indicates how a parameter value p' should be chosen to enforce a gap reduction A' = 3A (p). Note that after a new centering finds x(p'), the resulting gap A(p') may be different from the desired value A'. This will be discussed below. Note also that 3 cannot be chosen freely for the dual parameterization: this will also be discussed in a moment.

Finally, the last rows rearrange the expressions in the fourth one, singling out the updated parameter value, and give a graphical representation of the parameter updates.

Primal-dual parameterization. The parameterization used in the definition of central points (Definition 4.1) will be called primal-dual. It is based on the penalized function (18), and occupies the central column in Table 1. The denomination "primal-dual" is due to the close relationship between the parameter a and the duality gap associated to a central point, A(a) = n/a.

The description of the central path based on this parameterization was first done by Fiacco and McCormick [18]. Gonzaga [36] used it to construct a path-following algorithm with a complexity bound of O(n3L) arithmetical operations.




The conceptual Algorithm 2.1 is specialized for this function simply by specifying the command "Choose Pk+1 > Pk"

Set ak+1 -ak, /13

where 3 E (0,1) is an arbitrary number. The effect of 3 on global complexity will be studied in ?7.1.

After one iteration of the conceptual algorithm finds xk+1, we have L\(ak+l) =

f3A(ak), by a direct application of the duality Lemma 4.3. That is, the gap reduction obtained in one step of the algorithm is equal to the desired reduction 3: the method is "realistic."

Primal parameterization. Using the auxiliary function (19), define

(31) K > v -4 x(K) = argmin{fK(x) I x E SI, cTx < K}.

This characterization of central points was first used by Huard [50] in his method of centers for nonlinear programming. It was the basis of Renegar's algorithm [96] for linear programming, the first method to achieve a complexity of O(V/;iL) iterations. His method was refined by Vaidya [111], who proved a complexity of O(n3L) arithmetical operations.

The parameterization is called "primal" because the parameter K is an upper bound for the primal objective value. The geometrical interpretation is very appealing, as shown in Fig. 4.1. First note that the definition of analytic center is easily extended to regions defined by inequality constraints by considering slacks for the inequalities. Thus the center of {x E En I x E S, CTX < K} is the minimizer of -q log(K - CTX) + p(x).

FIG. 4.1. Central points in primal parameterizatiom

x(K) is the analytic center of the region {x E S I CTX < K}, with the constraint cTx < K repeated q times.

Remark. This is a nice opportunity to see why the center is "analytic," and not geometric. The inclusion of q copies of the constraint cTx < K increases the weight of this constraint in the auxiliary function, with the effect of pushing the center away from its




active region. Renegar found the clever property that for q > n the center is more than half way toward the optimal solutions, that is, cTx(K) < (v + K)/2.

Again, the characterization of central points is well defined, since fK (x) is strictly convex and grows indefinitely near the boundary of the restricted feasible region. The central point x(K) associated to K > v is uniquely determined by the condition

(32) Kq c - Px =0.

The first column of Table 1 describes the features of this parameterization. Consider x = x(K), and let us look closely to the duality gap for the case in which q = n. We have

A(K) = K - cTx(K).

The duality gap equals the slack in the extra restriction cTx < K. This shows that indeed cTx is lower than half way between v and K, and provides plenty of room to decrease K while keeping the constraint cTx < K inactive. There is still more room if q > n.

The conceptual Algorithm 2.1 is specialized by specifying the update rule:

Set Kk+1 :=f3Kk + (1-f3)c xk.

The duality gap at x(Kk+1) will be such that A(Kk+l) > 3A(Kk), since A(Kk+1) =

(n/q)(Kk+l - cTxk+1), and CTXk+l < CTXk. This characterizes the method as "opti- mistic," since the actual gap reduction is smaller than the desired reduction. Note that this does not destroy convergence, since K - v has a sound decrease at each iteration, as we show now.

LEMMA 4.4. Consider the center finction fK(*) with q > n and the central point x = x(K) for K > v. Define r = n/q and

K' =f3K+(1 -3)cTX.

Then

K '-0 r +f3 1 +/3 (33) K'- v< r < 1+ (33) ~~~~ ~~K-0' 1+r - 2

and the gap is related to K - v by

(34) A(K) < K-v < n A(K).

Proof. Let v < v be defined by v = cTx - A(K). Using row 3 of Table 1, A(K) =

r(K - cTx), and hence

(35) K-v = K-cTx + r(K-CTX).

Using row 4 of Table 1, K' - cTx = 3(K - CTx), and hence

K' - v = (K - cTx) + r(K - cTX).

Dividing this last equation by (35),

K'-v 3+r 1+/3 K-v 1+r - 2




since r < 1. Finally, since v < v < K' < K,

K'-v K'-v K-v - K-v'

proving (33). The left inequality in (34) stands because

A(K) = -(K-cTx) < K-v, q

since n < q and K - cTx < K - v. To prove the last inequality, note that K - v < K-c Tx + A(K). But K - cTX = (q/n)A(K), and it follows that

K- v< +q ?(K),

completing the proof. O We conclude that (Kk -0) decreases by a fixed ratio per iteration. This lemma clears

the role of q: the reduction in (Kk -0) per iteration (the steplength of the path-following algorithm) can be arbitrarily chosen by increasing q.

Dual parameterization. Using the auxiliary function (23), define

(36) v < v -4 x(v) = argmin{f,(x) I x E S?}.

This parameterization is based on the potential function that Karmarkar [55] defined with q = n. The beneficial effects of using other values for q (specially q = n + V,/;i) were noticed by Gonzaga [41] and by Todd and Ye [110].

The parameterization is called "dual" because the parameter v is a lower bound for the optimal value v, and is always associated to the dual cost of a feasible dual solution.

Although here the objective is not convex, we have

f()= log (CTX -

)

and the argument of the logarithm (the multiplicative potential function) is strictly convex, as proved by Imai [51]. Since the logarithm is a monotonically increasing function, the minimizers of both potential functions coincide, and x(v) is well defined for any v < v, since then fv(*) grows indefinitely near the boundary of S. The central point x(v) is uniquely characterized by

(37) cTq - v.

The last column of Table 1 describes features of this parameterization. The duality gap associated to the central point x(v) is given by

A(v) =-(cTx-v).

This means that given a lower bound v, the central point x(v) defines a new lower bound v' by

T n CTX T V) c x-v = -( x-) q




If n = q, then there is no gap reduction: if q > n, then any value of v' such that

c x -v =3(cTx v), 3 E (n/q, 1)

provides a new lower bound. This gives a safe range for 3 at a central point: 3 E (n/q, 1). The minimum possible value for 3 is obviously (cTx V)/(cTx -v), but this is not accessi- ble. The minimum safe value for 3 that we know how to compute is = (x)/(cTx -V), where i(x) is the guessed gap from (25): this would generate the largest possible steps, which will be discussed in ?9.1.

The conceptual Algorithm 2.1 is specialized for the dual parameterization by the command

Set vk+1 :=/3vk + (1-/3)cTxk,

where 3 E (n/q, 1). Like for the primal parameterization, the value of q determines the maximum al-

lowed gap reduction per iteration, and consequently the maximum steplength in the path-following algorithm. As the lower bound is updated, there is an immediate gap reduction. At xk+1, the gap will be still lower because cTxk+l < cT xk : the update is "pessimistic" because the actual gap reduction is better than we aimed at.

We have seen three parameterizations of the central path. They are related by the following lemma.

LEMMA 4.5. Consider a point x E S?. The following statements are equivalent. (i) x = x(a) for some a > 0. (ii) x = x(K) for some K > v. (iii) x = x(v) for some v < v. If either statement is satisfied, then the parameters are related by

(.3 8 a = ~q _ q . (38) a K -CTX CTx-v

Proof. Immediate by comparing the necessary and sufficient conditions for centrality in the three cases. O

4.3. Complexity of the conceptual path-following algorithm. Exact central points cannot be exactly computed in finite time and using finite words. The actual algorithms will have to deal with nearly central points, and this will be done from the next section on. But we can already examine the complexity of a path-following algorithm assuming that exact centering is done by an oracle in fixed time.

In studying the complexity of the conceptual algorithm we shall specify the stopping rule. As should be expected, the algorithm will stop when the duality gap is small. We shall assume that an initial central point xo = x(po) is given, and that the updates are made according to Table 1. For primal and dual parameterizations, we assume that q = O(n), q > n.

ALGORITHM 4.6. Conceptualpath-following: given po, xo = x(po), 3 E (0, 1) and an integer L > 0 (for the dualparameterization, 3 E (n/q, 1)). k := 0.

REPEAT Compute Pk+, by the update procedure in Table 1. Call an oracle compute xk+1 := x(pk+1).




k := k+ 1. UNTIL A(Pk) < 2L

Complexity. At this point we can compute the number of iterations needed to achieve termination. We must first comment the stopping rule, and on the significance of the number L.

In complexity studies we seek the order of magnitude of the number of computations (iterations, arithmetic operations, bit operations) needed to find an exact solution of the problem in the worst possible case. This number is computed in terms of the size of the problem, and that is a tricky point.

The size of a linear programming problem was defined by Khachiyan [58], when he devised the ellipsoidal method, the first polynomial algorithm for linear programming. The subject was further developed in many references, like Gacs and Lovasz [23], Bland, Goldfarb, and Todd [12], Megiddo [71], Schrijver [100], and Karmarkar [55].

We assume that only integers of a limited length can be computed exactly. The problem (P) is assumed to be stated with all-integer data (rational data can be easily converted to integers), and the size of the problem is defined as

L = l + n + 1,

where 1 is the total length of the input, i.e., the total number of bits used by the numbers in A, b, c. The significance of this number is the following: 2L is a very large number, greater than the result of any arithmetical computation using data without repetitions, and also greater than any minor determinant of A. It follows that 2-L is a very small number, with the following properties.

(i) If x is a vertex of S, then no component of x can be in the interval (0, 2L). (ii) If x is a vertex of S, then cTx - V V (0, 2-L). A sketch of the proof of (i) is as follows: if x is a vertex of S, then ABXB = b, where

B c 1,* *, m are indices of a basis and AB is a nonsingular submatrix of A. It follows that

A-b_ (cof(AB) )Tb XB = AB'b - det(AB)

But det(AB) > 2-L, and all entries in the numerator are integers, proving (i). The proof of (ii) is straightforward using the integrality of the costs.

The stopping rule for all algorithms is precisely (ii) above. If an algorithm obtains a point such that cTx - Vj < 2-L, then an exact optimal solution can be obtained through the following lemma.

LEMMA 4.7 (purification). Given any point x E S, there exists a procedure that computes a vertex x of S such that cTx < cTx, in no more than O(n3) arithmetical operations.

We shall not prove the lemma. For a construction of the vertex, see, for instance, Kortanek and Zhu [66]. The construction of a vertex is an exercise on simplex techniques: each iteration of a purification algorithm reduces one variable to zero along a descent direction for the cost, doing pivoting like the simplex method. No more than n iterations are needed, with 0(n2) computations per iteration.

Putting together the facts above, an algorithm can be stopped whenever a solution is found such that

cT xV<2L

In this case, a purification leads to a vertex x with cTX - V < 2L, which by (ii) above implies in c x - v = 0.




For our conceptual algorithm, set E = 2-L, and the stopping rule becomes

ak > n2L.

Complexity of Algorithm 4.6. LEMMA 4.8. Assume that Algorithm 4.6 starts with A(po) < 2L. Then the number of

iterations is bounded by 0(L/ log 9), where 9 = 1/f for the primal-dual or dualparameter- izations, 9 = 2/(1 + [) for the primalparameterization.

Proof. (i) for p = a or p = v we have after k iterations and before stopping,

2"L < A(pk) < 3kA(p )

Taking logarithms in base 2, we obtain -L < k log [ + L, or equivalently,

k < 2L/log(1/3) = O(L/log9),

completing the proof for primal-dual of dual parameterizations. (ii) If p = K, we obtain by Lemma 4.4

Kk -V < Ok(Ko-V).

Using again Lemma 4.4,

2-L < ?(Kk) < q + A(Ko). n

Taking logarithms, and remembering that q = 0(n), -L < -k log 9 + 0(L), or equivalently k < 0(L/ log 9), completing the proof. O

The choice of 3. The complexity of the implementable algorithms depends on the number of SSD iterations needed in each iteration of the master algorithm, and that depends on the definition of nearly central points to be done in the next section. As a preview of the kind of results to be found, let us examine the two main ways of choosing,3.

(i) [ independent of n: in this case the complexity of the conceptual algorithm is O(L), but the internal SSD algorithm may need a large number of iterations. We shall see that the best global complexity obtained for these large steps is 0(nL) SSD iterations.

(ii) [ dependent on V/ni such that 1/fl = 1 + v/6/n: in this case the number of iterations in the internal algorithm is dependent on v, and the global number of SSD iterations will be bounded by 0(v#nL).

For small values of v (short steps), it is possible to ensure exactly one SSD iteration per iteration of the path-following algorithm (to obtain nearly centered points). This scheme gives the best theoretical complexity bound obtained up to now in terms of SSD iterations, and was obtained by Renegar [96]. Each SSD iteration has the complexity of one matrix inversion, used in the projection computation, O(n3) arithmetical operations, bringing the overall complexity to O(n3_5L) operations. A further reduction in this number to O(n3L) can be obtained by computing the inverses by rank-one updates, in a scheme first used by Karmarkar [55], and later by Gonzaga [36] and by Vaidya [111].

Higher values of v keep the same theoretical complexity bound, and produce "not- so-short steps methods," first obtained by Gonzaga [37]-[38].

5. Nearly central points. Since it is impossible to compute a central point x(p) exactly, we need pragmatic criteria to decide when a point x is "near" the central point x(p).




The internal algorithm in methods that follow model 2.2 will then stop based on such a proximity criterion. This is not a new problem. Polak [95] devised an implementable barrier function algorithm twenty years ago, using as stopping rule for the internal iterations for ak a condition like IIPVfatk (x) II < Ek where Ek > 0 decreases with k. He provided a convergence proof for general nonlinear programming problems. Similar rules were proposed by Fiacco and McCormick, and used in the implementation of their SUMT-3 and SUMT-4 computer codes, described in Mylander, Holmes, and McCormick [91], with no convergence analysis. This last algorithm, with a proper choice of parameters, has recently been shown by Anstreicher [6] to be polynomial for linear and quadratic programming problems, with a bound of O(/niiL log L) iterations.

We now see that Polak's proximity criterion is not satisfactory, and the main reason is that it is scale dependent. That is why he needed an increasing precision given by Ek -+ 0. In this section we study proximity criteria, and conclude that a very convenient one coincides with Polak's if the computation is preceded by a scaling transformation.

All our computations are scale-independent. In fact SSD precedes all computations by a scaling, and NR is naturally scale-independent. To fix the notation, recall the SSD directions from (22) and (23): h(x, p) in the original space and h(x, p) in the transformed space, with

(39) h(x, p) = Xh(x, p).

And now define a new entity:

(40) 6(x,p) = lIh(x,p)ll,

the length of the SSD (or NR) step in scaled space. We shall see that this magnitude is a good measure of the proximity from x to x(p).

5.1. Informal discussion of proximity. Here we intend to show the geometrical reasoning behind the definition of nearly central points. Let us consider the analytic center X of S, and extend the results to the penalized function later.

Imagine initially that X = e. In this simple case, the potential function around e can be written as

p(e + h) = p(e) + Vp(e)Th + 'hTV2p(e)h + o(e).

Substituting p(e) = 0, V2p(e) = I from (13), and since Vp(e)Th = 0 because e is the analytic center,

p(e + h) = 1IIhII2 + o(h).

Figure 5.1 shows level curves for this function. For small values of lIhil (small in relation to 1), the level curves are almost spherical, as a consequence of the excellent behavior of the logarithm near 1. That is, the quadratic approximation is very good near the center: it is easy to show that for lIhlI < 0.3, the maximum error in the quadratic approximation is about 30 percent, independently of the dimension of the space.

Here, two magnitudes seem good measures of proximity to the center at e: lIx - elI and p(x) - p(e). If either llx - ell < 0.3 or p(x) - p(e) < 0.05, x is in a region where the quadratic approximation is good. Note that large values for these quantities give no useful information besides the fact that x is not nearly central.




FIG. 5.1. Level curvesfor p(-) near the analytic center.

These two proximity criteria are not useful for practical methods, since their computation depends on the knowledge of the center. Let us describe the third and best criterion: the NR step h(x) from x. Figure 5.1 illustrates this direction. At points where the quadratic approximation is good, the NR step from x is a good approximation to the step to the minimizer of the function, i.e., h(x) ; e - x, or IIh(x)I I lIx - x(a)II.

This informal analysis leads us to three related quantities for a center at e: function variation, Euclidean distance, and length of the NR step. The first one is scale-invariant, the other ones are norm dependent. Since all criteria are similar near e, the extension to the general case is obvious: precede the computation of proximity by a scaling.

Remark. These observations are a guide to the examination of Fig. 5.1. Near e things work well. The converse, small NR step (after scaling) implying proximity is far from trivial, but is true, as we will show in the formal analysis below. We hope to have convinced the reader that given a point x, it stands a good chance of being near X (in terms of function values) if the scaled NR direction has a small norm, i.e., II h(x) ?I < < 1.

The penalized function differs from the barrier function by a linear term, and nothing changes in the analysis of the second-order approximation. The criteria will now be from a point x: f, (x) - f (x (a)), IX-1 (x - x(a)) II, and 6(x, a) = IIh(x, a) I I.

5.2. Formal treatment of proximity. We saw in Lemma 4.3 that central points have nice primal-dual properties. The SSD direction allows the extension of these properties to a large region that contains the nearly central points, as we show now. The lemma below extends a duality result in Gonzaga [38].

Remark. We shall use the notation introduced in (20) and (22) for the SSD directions, remembering that h(x, a) = Xh(x, a) and the proximity criterion is 6(x, a) =

h(x, a) II. LEMMA 5.1. Consider a penalty parameter a > 0 and a point x E S?. If h(x, a) < e

then

= -1 e - h(x, a) a

is a feasible dual slack and the duality gap is

(41) , n-eTh(x, a) < n+6(x,a)f/ii a a




Proof. z > 0 follows from the hypotheses. From Lemma 3.2, z is feasible for (P) if and only if

e e-h(x, a) z = -

a

is feasible for the scaled problem. Using Lemma 3.1, we must prove that PA = PAC. Projecting,

pAZ _ PAe - PAh(x, a) a

But h(x, a) E Ar(A), and thus

PAh(x, a) = h(x, a) = -aPAC + PAe.

Simplifying the last expressions we obtain PAZ- = PAC, completing the proof of feasibility.

The value of the gap in the scaled problem (and also in the original one) is

e,2_' n + eTh(x, a) a

The last inequality follows from e'h(x, a) < lie h IIh(x, a) I =# 6 (x, ac), completing the proof. D

This lemma will be very useful both for providing stopping rules for algorithms and for updating the lower bounds v in potential reduction algorithms. The particularization of the lemma to nearly central points is immediate.

COROLLARY 5.2. Consider a penaltyparameter a> 0 and a point x E So. If 6(x, a) < 1 then the conclusions of Lemma 5.1 hold.

Proof. Assume that 6(x, a) = lIh(x, a) < 1. Then Ihi(x, a)I < 1 for i = 1 n. It follows that h(x, a) > -e, completing the proof. D

The lemma above provides a way of guessing a (not necessarily feasible) dual slack, associated to a given x E SO and a valid parameter value p. We shall introduce new notation for this slack and for the associated duality gap and lower bound. Here dp PAd.

In the expressions below the primal-dual parameter a appears. For the other parameterizations, substitute the expressions in (38).

z(x p) = X_e-h(x, a) =X-1( +( )/a) a

(42) A(X,P) = XTz(x,a) if z(x,a) > 0; A(x, a) = +oo otherwise.

V(X,p) = cTx - A(x, a).

Computing the gap in the expressions above for a given point in all parameterizations, we obtain (if A(x, a) < +oo):

(43) (x, P) = neh neh

(K-CTx) = n e (CTx _ v), A(x,p) ~a q q

where h = h(x, p). Summing up, we now have three closely related definitions for these entities: z(x, p), A (x, p), and v(x, p) are associated to any x E SO and valid p by Lemma




5.1. z(p) = z(x(p), p), Ai(p) = A (x(p), p), v(p) = v(x(p), p) are associated to central points by Lemma 4.3. Z(x), A (x) and v (x) are associated to any x E SO by (26) and (25). It is easy to see that

(44) i\(x) = inf{ A(x, a) a a > O}.

The following lemma, useful in the study of primal-dual methods, is stated here for completeness. It relates our proximity criterion to the criterion used in Roos and Vial [98].

LEMMA 5.3. Consider a penalty parameter a > 0 and a point x E SO. Then

6(x, a) = min{lIaXz - ell I ATw + z = c, w E Jm}, z

and the minimum occurs at z(x, a). Proof. From (6), we know that given any vector d and a > 0, IIPAdll = min{d -

aATW I w E Em}. Hence

6(x, a) = lIh(x, a) 11 = min{lIa-e -e-aA IwI w E Em}

= min{lIaX(c - ATw) - ell I w E Em}

= min{lIaXz - ell I ATw + z = c, w E Em.}

We must still prove that the minimizer is z(x, a), i.e., that 6(x, a) = IIaXz(x, a) - eli. This is straightforward by substitution of the expression for z(x, a) from (42): this last equality becomes 6(x, a) = IIh(x, a)II, completing the proof. D

The main results on proximity will now be summarized in one lemma, and the proofs will be made afterward in several lemmas. The treatment of proximity is present in all papers that deal with path-following algorithms, and in some way all papers use the same proximity criterion: the length of the scaled SSD direction 6(x, a). The analyses were initially very involved (see for instance Gonzaga [36]), and several treatments appeared since 1987, like the paper by Renegar and Shub [97]. In our opinion, the best of all proximity theorems are the ones in Roos and Vial [98] and in den Hertog, Roos, and Vial [49], stated in items (i) and (iv) of the lemma below. Our proofs for these items will follow directly these references, and are very elegant. The proof of the other items is inspired in the cited work by Renegar and Shub.

We would also like to make reference to the extensive and deep work on proximity and its consequences made by Nesterov and Nemirovsky in their book [94]. They define a class of functions called "self-concordant" for which conditions like the ones in the next lemma can be proved, and show how algorithms can be constructed for a wide range of Gptimization problems.

LEMMA 5.4. Consider a penalty parameter a > 0, a point x E SO and its proximity 6 6(x, a).

(i) Efficiency of the SSD step. If 6 < 1, then x = x + h(x, a) is feasible and

6(X, a) < 62.

(ii) Proximity and Euclidean distance. If 6 < 1, then

1X'-l (x - x(a)) 11 1 - 6

(iii) Proximity and function values. If 6 < 1, then

62




(iv) Guaranteed descent. x = x + (1/(1 + 6))h(x, a) is feasible and

fa(x) < fa(x) - (6 - log(1 + 6)).

Proof. See lemmas below. D Property (i) is the most striking of all. It states that the NR step decreases in a

quadratic rate from any initial point such that 6(x, a) < 1. This means that SSD or NR algorithms reduce the proximity in a sequence of values

6,62 64, 62.

Parts (ii) and (iii) show that the concept of proximity is well defined from a geometrical point of view.

Part (iv) conveys all the information needed for nonproximal points. If 6(x, a) is large, than one SSD iteration results in a sound decrease of the objective function.

The lemma above applies only to the primal-dual parameterization. All these properties can be adapted to the center function, either by using the Newton-Raphson direction instead of the SSD direction, or by a simple transformation in the problem, as we shall comment in ?6. There are no results similar to (i)-(iii) for the potential function, but (iv) is trivial if one notes that its first term log(cTx - v) is strictly concave. We shall expand this argument in ?9.

53. Proofs for Lemma 5.4. All properties to be studied are scale-invariant, and all proofs become easy if we start them by scaling the problem. To keep notation as simple as possible, we shall avoid the bars that characterize scaled problems by making the assumption that the given point is x = e.

LEMMA 5.5. Let a > O and x E SO begiven, and let xi = x + h(x, a). If 6 = 6(x, a) < 1, then x E SO and (,a) < 62.

Proof. Assume without loss of generality that x = e and 6 = 6(e, a) < 1. Then

6(e,a) = IIh(e,a)II, h(e,a) = -PVf, (e) = -acp+ep.

Since 6 < 1, x = e + h(e, a) > O and x E S?. By definition of projection,

(45) Vf,f (e) =-h(e, a) + ATiI3,

where iwv E Em. Now consider the point x = e + h(e, a): the proximity at this point is given by

6(xi,a) = llh(x,a)II, h(x, a) = PAkXVfa(i)

It follows that

6(x, a) = IIPA,X Vfa (x) I I Using now the property of projections (6),

6(x, a) = min IIXVfa(,) - XATwII. wERtn

And now, the key step of the proof: in particular for w = w,

(46) 6(i, a) < 11XVfa(x) - XA Twzll.




The rest of the proof is the simplification of the vector

XVfL(x) -XATw.

First, extract AT i from (45):

y : = -(Vfa (x) - Vfa (e) - h(e, a)).

In this expression, Vf]f(i) - Vf]f (e) = - + e, and h(e, a) = - e by construction. Substituting into the expression above,

-y = Q- + 2e -x 1

Fori = 1, ***,n

-iy = -X + 2i - 1 =(i,- 1)2.

Using again the fact that x - e = h(e, a) by construction,

'Yi =(hi (e, ax))2.

Computing the norm, n

Ih'112 = (hi(e, a))4 i=l

n \2

< (hi (e, aX))2)

1 IIh(e, a) 114.

Finally, introducing this in (46),

6( ,a) < IIh(e, a) 112 = 6(e )2

completing the proof. D The next lemma relates proximity and Euclidean distance. The proof is done by

summing the series of NR steps. LEMMA 5.6. If 6 = 6(x, a) < 1 then 1 X-1 (x(a) - x)j | < /( 1-6). Proof. Assume without loss of generality that x = e. Consider the sequence gener-

ated by the NR algorithm

xo = e, x = e + ho, ... xk = Xk-1 + hk

where hk denotes h(xk, a). Consider also the scaled directions hk = Xj- hk h(xk, a) and remember that 6(xk, a) = Ilhk11

We shall prove by induction that

(47) lIXk _ ell < 6 + 62 + 63+

+ 62k_1

and consequently for i = 1 , n.,

(48) IX?l < 1+6+62+63+ +62k_1




(i) For k = 1, llxl - ell = Ilh 11 = 6(x, a) = 6. (ii) Assuming that (47) is satisfied at xk,

k+1 _ k = k= -k x -x hk= khk sk+1 _ sk =X-kk xi xi i

Using (48),

liX+1 Xil < (1 + 6 + 62 + 63+ + 62k_j)ilhl

and it follows that

iXk+l xk < (1 + 6 + 62 + 63 + + 622k_j)i-kl.

By Lemma 5.5, Ilhk11 = 6(xk, a) ? 62k, and hence

(49) |lxk+l - xkll < (62k + 62k+1 + + 62k+11).

But lixk+1 - ell < lixk _ ell + lixk+l _ xk 11. Adding (47) and (49), we obtain

ilXk+ el < 6 + 62 + 63 +... + 62 k+1,

completing the proof by induction. The conclusion of the lemma follows from the fact that the series in (47) adds to 6/(1 - 6), completing the proof. D

LEMMA 5.7. If 6 = 6(x, a) < 1, then f,(x) - f,(x (a)) < 62/(1 _ 62).

Proof. Let f'a (y) = aeTy + p(y) be the scaled penalized function with x _ Xy. Since variations of the barrier function are scale-invariant and the penalized functions are convex,

f (x + h(x, a)) - fa(x) = f1(e + h(x, a)) -f (e) > Vfa (e) h(x, a) = -Ilh(x,a) 112 = -6(x, a)2.

For x = xk,

fa(Xk) - fa(Xk+l) < 6(x k, a)2.

But by Lemma 5.5, 6(xk, a) < 62k, and thus

fa(Xk) _ fa(Xk+1) < 62 k+1

f~(x?)-ft(Xk) < 62 + 622 +623 + +62k

< 62+64+66+...

62

1 + 62'

completing the proof. D

The last lemma shows that the SSD algorithm really obtains a sound decrease in f' (.) far from a central point.

LEMMA 5.8. Let 6 = 6(x, a). Then x = x + (1/(1 + 6))h(x, a) is feasible and

fa (x) < fa (x) - (6 - log(1 + 6)).




Proof. Assume without loss of generality that x = e, and define A = 1/(1 + 6), h = h(e, a). Then8 = lihil. Since lAhl = /(1 + 6) < 1, i = e + Ah > O, and x is feasible.

The Taylor expansion of f,(.) about e gives 00

A f, = f, (e + Ah) - f, (e) = A-IIhI2 + E tk, k=2

where

tk=(A)kn Ak~ (nh k/2 A Ak k E k ( ) k

It follows that

Af, < -A62 A6k k=2

= _A62 - A6 + E k=1

= -Ab(1 + 6) - log(1 - A6).

Substituting A = 1/(1 + 6) (the minimizer of this last expression),

Afa < -6 + log(1 + 6),

completing the proof. D

6. Centralization. Consider a parameterization of the central path and a valid parameter value p. The centralization problem is defined as: given xo E SO ande E (O, 0.5), find a point x E S such that

65(x,p) = jjh(x,p)jj < E.

The point x is called "nearly centered." The choice of E is quite free, depending on the algorithm. The most popular values are around E = 0.5.

Finding the analytic center. When the objective function is simply p(.) (identical with f,(.) for a = 0), we are looking for an approximation for the analytic center X of S. The proximity measure is given simply by 6(x) = IIPAxe I. This problem was solved by Vaidya [112] using the SSD algorithm, or equivalently, NR algorithm with line searches. The complexity analysis is very simple, as we shall now see. The number of iterations depends on E and on p(xO) - P(X).

LEMMA 6.1. Let xo E SO ande E (0, 0.5] be given. The sequence (xk) generated by the SSD algorithm applied from x? has the following properties:

(i) Xk is nearly centered, i.e., 6(Xk) < 0.5 for k > ki = 11(p(xO) - p(X)). (ii) 6(Xk) < E for k > k1 + log2 1log2 '1. Proof. (i) is a direct consequence of Lemma 5.4(iv), which ensures a descent of

0.5 - log 1.5 > 0.094 for points not nearly centered. The number of iterations at such points cannot exceed (p(xO) - p(X))/0.094, completing the proof of (i).

(ii) Assume that 6(Xkl) < 0.5. Then

6(xkl+i) < 0.52i




by Lemma 5.4(i). For any iteration before 6(xkl+i) < 'E

0.523 >e

Taking logarithms twice, 2i < - log2 E, j < log2 1 log2 Ej, completing the proof. D We conclude that for E = 0(1) an approximate center with 6(x) < e is found in

O(p(xO) - p(X)) iterations. Centralization in primal-dual parameterization. Since ff (.) differs from p(.) by a

linear term, the result above on the performance of Newton-Raphson's method can be immediately extended to the computation of nearly central points. Again, we conclude that for E = 0(1) the SSD algorithm starting at xo E SO finds x such that 6(x, a) <E in O(fa (xO) - fa (x (a))) iterations.

This result is not useful as it is because f, (x (a)) is usually unknown. The situation created by the path-following Algorithm 2.2 is the following: a point xk near x(ak) is known, and SSD is called to find a point xk+l near x(ak+l), where ak+1 = ak/f, [ < 1. This situation is reduced to the one above and solved in the next lemma, proved in [37].

LEMMA 6.2. Suppose that xo E S0 and a > 0 are given and 6(xo, a) < 0.5. Let a' =(1+ )a,A > 0.Then,

(i) if IL < 0.1/ fli then SSD finds x such that 6(x, a') < 0.5 in one iteration; (ii) otherwise, SSD finds x such that 6(x, a') < 0.5 in O(ng2/1 + IL) iterations. Proof. (i) Assume without loss of generality that xo = e (or do a scaling), and

6(e, a) = Iaccp- epII < 0.5. It follows that

(50) allcpll < lIepll +0.5 < ~/n+0.5 < 1.5V6.

Now let a' = (1 + v/ln)a, v < 0.1:

6(e, a') = lIa'cp - epll = acc - ep + 7=acp.

Since a IIcpII < 1.5, we obtain

6(e, a') < lIacp - epll + 1.5v < 0.7.

Hence 6(e, a') < 0.7. Lemma 5.4(i) ensures that one SSD iteration produces a point x such that 6(x, a') < 0.72 < 0.5, completing the proof of (i).

(ii) Assume that g > 0.1/#. To simplify the notation, let us define a = x(a), b = x(a'). We shall find a bound for f], (xO) - fa, (b) and use Lemma 6.1. Note initially that for any x E So, fa'/ (x) - faf (x) = itacTx.

- A bound for f]f' (x?) - fci' (a):

fcz'(x0) - fa'(a) = fc(x?) - f,(a) + jtacT(xO - a) < fcz(x?) - fcz(a) + btallcllllxO - all

62 1.5# 6 ? l-62 a 1-6'

using Lemma 5.4 (ii), (iii), and (50). For 6 < 0.5 and 1X > 0.1//?,

(51) f" (x?) - fc (a) = 0(it v;;;).

- A bound for A f]' (a) - fa'/ (b): as above,

fa, (a) - f' (b) = fcz(a) - f,(b) + itacTa - itacTb.




By definition of fa(f), fa(a) < f, (b), and hence

(52) fc,(a) - fa,(b) < ita(cTa - c b). But a = x(a) and b = x(a') are central points, respectively, associated to the lower bounds v(a) < v (a') such thatby Lemma 4.3, cTa-v(a) = n/a and cTb-v(a') = n/a'. It follows that

cTa_cTb =v(a)-v(a') + n n

a a' a n n n 1 n ji - a a/ a V +IL a +IL

Merging this with (52), 2

- Finally, adding (51) and (53), we have the desired bound:

f' (x?)-f' (b) < n + _ +=)

By Lemma 6.1, the number of SSD iterations needed to find a nearly central point will be of this order, completing the proof. O

Centralization in primal parameterization. In primal parameterization the objective function

n

fK(x) = -qlog(K _ CTx) log xi i=l1

differs from the barrier function by a nonlinear term, and thus SSD and Newton-Raphson do not coincide. Although it is possible to prove results similar to those in Lemma 5.4 for this function, this seems to have little interest. In fact, the minimization of fk (.) is the computation of the analytic center of the region defined by SO with the extra constraint cTx < K, repeated q times. This problem can be put in our format (with equality constraints) by defining q slack variables xn+1, ... , Xn+q, all constrained by xn+j = K- cTx, and now the objective function becomes a barrier function,

n+q

fK (x) =-Eog xi. i=l1

The application of SSD to this transformed problem is equivalent to Newton's method, and all results obtained for the primal-dual parameterization can be extended to this case. Note that this reduction of the centering problem is trivial if the original linear programming problem is defined with inequality constraints.

Results equivalent to Lemma 6.2 (i) and (ii) were proved, respectively, by Renegar [96], Renegar and Shub [97], den Hertog, Roos, and Vial [49] and by den Hertog, Roos, and Terlaky [48].

Centralization in dual parameterization. The first term of the potential function is not convex, and thus the results in Lemma 6.2 cannot be extended to this case. In primal potential reduction algorithms SSD is used, and the complexity analysis must follow a different path, to be described in ?9. This is probably the reason why the complexity of these methods was lowered to O(f/iiL) iterations only two years later than for the other approaches.




7. Implementable penalty function and center methods. Now all the tools needed for the construction of implementable methods are ready. The implementable path- following Algorithm 2.2 will be completely specified, using the update rules in ?4. The word "near" will be replaced by the proximity criterion from ?5. The complexity analysis for primal-dual and primal parameterizations will be easily done by merging the results for the outer algorithm in ?4.3 and for the centralization methods in ?6. We shall develop a complete treatment for the primal-dual parameterization and indicate its adaptation to the primal parameterization. Dual parameterization will be treated separately in ?9.

Initialization. All methods need an initial parameter value po and a point xo near x(po). This can be obtained by a centralization starting at a given feasible point. An initial feasible point is usually obtained by a problem manipulation (big-M method), like in Adler, Karmarkar, Resende, and Veiga [2]. It is also possible to manipulate the original problem to obtain directly a nearly central point, as is done in Gonzaga [36] or in Monteiro and Adler [89], but these approaches are inefficient in practice.

Although the initialization has great influence in the performance of the algorithms, it is beyond the scope of this paper. Other approaches are commented on in ?10.

The implementable algorithm. The algorithm below uses any of our three parameterizations. It has a general stopping rule based on the duality gap A (x, p) defined in (42). This gap replaces A (p) in the conceptual Algorithm 4.6.

ALGORITHM 7.1. Implementable path-following: given e E (0,0.5], 3 E (0,1), an admissible initial parameter po and xo E So such that 6(x0, po) < E (For p = v, ,3 has limitations to be discussed in ?9.)

k := 0.

REPEAT Compute Pk+, by the update procedure in Table 1. Call the SSD or Newton-Raphson algorithm to find xk+l E SO such that 6(Xk+l, pk+l) < 6 k := k+ 1.

UNTIL L\(xkpk) < 2L.

7.1. Penalty function methods. Soon after Karmarkar published his algorithm, Gill, Murray, Sounders, Tomlin, and Wright [26] showed its similarity to the traditional barrier function method. Gonzaga [36] described a short-step algorithm that follows the central path based on the penalized function and solves (P) in O(frniL) SSD iterations, with an overall complexity of O(n3L) arithmetical computations.

The need for short steps was eliminated by Roos and Vial [99] and by Gonzaga [37], who proved a bound of O(nL) SSD iterations for parameter updates ak+1 := (1 + I)ak, where 1t is an arbitrary constant. The last reference proves a bound of O(fiiL) iterations if 1 = O(1/N/si).

We shall describe the short-step method from [36] and the large-step method from [37].

Short-step method. Constructing a short-step method is a straightforward application of Lemma 6.2(i). The algorithm is 7.1 with e = 0.5 and the update rule

ak+1 (i+ .=) ak.

Iteration k starts with xk near x(Cak), updates a and in one SSD iteration finds xk+l near ak+1. One iteration is represented in Fig. 7.1 (with inspiration in Larson's worms).




X(O(k

xk

FIG. 7.1. One iteration of the short-step algorithm.

LEMMA 7.2. Assume that the short-step algorithm starts with A(ao) < 2L. Then it stops in O(finL) SSD iterations.

Proof. From Lemma 6.2(i), the number of SSD iterations equals the number of outer iterations of the algorithm. From Lemma 4.8, the outer algorithm obtains A(ack) < 2-(L+1) in

(Iog(l + it))

iterations, where it = 0.1/xfi. But log(1 + it) > pt/(1 + it), and hence

K = O (L + 0) - O(10 L(V/ + 0.1)) = O(VrnL).

We conclude that after K = O(f/iiL) iterations, A(aK) < 2-(L+1). Using Lemma 5.1,

A(XK, aK) < ? < 2 = 2A(ak) < 2 aGK aGK

and the algorithm stops, completing the proof. O So, the updating rule, ak+1 := (1 + O.1/#/i)ak works well, but larger steps could

still lead to 6(xk, ak+l) < , with the consequence that 6(x, ak+l) < e after one SSD iteration. The "largest possible short steps" are obtained by computing ak+1 such that 6(Xk, ak+l) exactly equals xFi. This computation is easy, as can be seen in the complete algorithm below (never published before).

ALGORITHM 7.3. "Largest short steps": given aO > 0, e E (0, 1), xo E SO such that 6(x0, aO) <

k := 0.




REPEAT Scaling: A := AXk, cp : PAXkC, ep := PAe. Update: compute ak+1 such that IIak+lCop - epll = . Next point: y e - ak+1Cp + ep. Scaling: xk+1 : XkY. k :=k+1.

UNTIL L(Xk, ak) < 2L.

Note that all the steps have approximately the same length (the length of the worm in Fig. 7.1) in the Euclidean metric after scaling (see Lemma 5.4(ii)). In the original space, the steplength measured at each point in the metric induced by r X II X-1rII is approximately constant. This defines a Riemannian metric, and the length of the central path between a = 0 and a = 2-L in this metric will be bounded both below and above by numbers proportional to the number of steps of the algorithm above. This kind of reasoning is interesting when trying to find an algorithm with a lower complexity: if such an algorithm exists, it should take longer steps, perhaps based on the curvature of the path or on higher-order approximations.

Large-step method. A large-step method is obtained by the update rule

aGk+1 := ahk(i + IL), IL > 0.1/vrn.

The complexity study is very similar to the short-step case. In the lemma below we omit some details of the proof.

LEMMA 7.4. Assume that the short-step algorithm starts with A(ao) < 2L. Then it stops in O(jtnL) SSD iterations.

Proof. From Lemma 4.8, the number of outer iterations is bounded by

K=O (Ig(l + it) =0 (L /l.

By Lemma 6.2(ii) the number of SSD steps in each iteration is O(nJL2/(1 + X1)). Mul- tiplying this number by K, we immediately get the global bound of O(jtnL) SSD steps, completing the proof. [

Note that the theoretical complexity grows with the steplength. Two cases are interesting:

- "not-so-short steps": if it = v/# with v = 0(1), then the complexity is bounded by O(vw/;iL) = O(fiiL). So, the complexity of O(f iiL) is kept for any IL = 0(1/r) but it degrades as IL increases.

- large steps: if it = 0(1), then the complexity is O(nL). This last case was independently studied by Roos and Vial [99], who present a very

elegant complexity proof.

7.2. Methods of centers. The most natural setting for methods of centers is the formulation of the linear programming problem with inequality constraints, that can be obtained by dualizing the problem. In the next paragraph we take an informal look at a problem with inequality constraints, with a notation that differs from the rest of the paper.

minimize cTx

subject to Ax < b.




In this formulation, the barrier function would be

p(x) = - log(b-Ax)i.

The center function for this formulation would be

fK(x) = -qlog(K - cTx) - ,log(b -Ax)i,

and this is the barrier function for the problem

minimize cTx

subject to Ax < b

cTx < K,

with the extra constraint repeated q times. This construction was made by Renegar [96]. Each iteration of a path-following al-

gorithm is now a search for the analytic center of a polytope, a well-understood problem with a simple geometry. The simplicity of this centering problem is probably the reason why this has been the first successful approach for path-following methods for linear programming, quadratically constrained quadratic programming, and convex programming (see the references cited in ?10).

A similar reduction of the centralization problem to the problem of finding an analytic center has been indicated in ?6 for the problem in primal formulation, with the introduction of slack variables for the constraint cTx < K. The conclusions for both formulations must be equivalent, although the dual formulation gives a better geometrical insight.

Essentially the same results as we obtained for the primal-dual formulation can now be replicated for the primal formulation: values of 3 near 1 produce short steps, with one internal iteration per centralization and a complexity of O(fiiL) NR iterations. This is the result obtained by Renegar [96], the first algorithm for linear programming with this low complexity. Updates with 3 = 1/(1 + O(1/#)) were used by den Hertog, Roos, and Terlaky [48], for the dual problem. They obtained "not-so-short" steps with the same complexity and properties as for the primal-dual approach.

8. Primal-dual methods. Primal-dual properties of the central path were first studied by Megiddo [72], and motivated a primal-dual algorithm by Kojima, Misuno, and Yoshise [65]. The first version of their algorithm did not reduce the complexity obtained by Karmarkar, but improved variants developed by the same authors [64] and independently by Monteiro and Adler [89] reached the bound of O(f/iiL) iterations and O(n3L) arithmetical operations.

Consider the primal and dual problems from (7) and (8):

minimize zTx minimize xTz x z

subject to Ax = b subject to PAZ = PAC

x > 0 z > 0.

In the primal problem, z E ZO is a fixed dual feasible slack, and in the dual problem x E SO is a primal feasible solution.




The primal-dual penalized function is defined for a > 0 as

n

(54) So z E Z E (X z) = aXTZ + P(X) + P(Z) = axTz - logxizi.

Scaling. Given a positive diagonal matrix D, the scaled problems are obtained by the transformations x = Dx, z = D1z (see Lemma 3.2):

* * * T - ** * -T - minimize x minimize x z

(55) subject to ADx = b subject to PAD- = PADDC

x > 0 z > O.

It is easy to check by substitution that

(56) fa(x,z) = f,t(Dt,D-1z) = f,g(x, z).

Central points. The primal dual central points are the minimizers of fc, (x, z) for x E SO and z E Z?.

We have already associated to each a > 0 a primal central point x(a) and a dual slack z(a), defined in Lemma 4.8 as z(a) = x-1 (a)/a. Now we show that primal-dual central points are consistent with these definitions.

LEMMA 8.1. The following statements regarding a > 0 and an interior feasible pair (x, z) are equivalent:

(i) (x, z) form a primal-dual central point associated to a. (ii) Vxfc(x, z) = 0 and V_f,(x, z) = 0. (iii) aXZ = I. (iv) x = x(a) and z = z(a). Proof. Assume that (x, z-) is central. Fixing z-,

f, (x, z) = min f (x,I), X, xEXO

and we conclude that x = x(a). Computing gradients, we obtain for any z E Z?,

Vx fc (x, z) = az- , V_ fc, (x, z) =ax--z-

Substituting z = z(a) = x- 1/a in these expressions,

Vxf,(x, z) = V f,(x, z) = 0.

Since fc,(,) is strictly convex, this ensures that z is the unique minimizer of f, (x, .), and thus z = z = z(a). This proves the equivalence of (i), (ii), and (iv). The equivalence of (ii) and (iii) is immediate from the expressions for the gradients, completing the proof. [

Nearly central points. Condition (iii) in the lemma above provides a simple criterion for measuring centrality:

(57) 6(x, z, a) = IIaXz-ell.




This criterion has a nice interpretation: when 6(x, z, a) = 0, xizi = 1/a for i - 1, 2, , n, and the complementary slacks are balanced. Otherwise, 6(x, z, a) measures the unbalance of the slacks. Note that when a -- oo, xizi -O 0 on the central path.

This proximity criterion is related to the primal proximity 6(x, a) by Lemma 5.3:

(58) 6(x, a) = 6(x, z(x, a), a) = min{6(x, z, a) I ATw + z = c, w E JRm},

where z(x, a) is the dual slack defined in (42). Thus 6(x, a) < 6(x, z, a). Besides this, if 6(x, z, a) < 1, then a better (in the sense of proximity) dual slack is provided by z(x, a).

Centering. Consider a > 0 and an interior pair (x, z). Finding (x(a), z(a)) from (x, z) can be seen either as one problem in lR2n or as two separate n-dimensional problems.

The feasible set for the 2n-dimensional problem is interesting: each feasible direction (hx, h_) is such that hx E Nr(A) and h_ E N(PA) = 1Z(AT) I Nr(A). The steepest descent direction for foc( ( *) from (x, z) is given by (hx, he), where

(59) hx = -PAVxf.(X,Z), hz = -PAVzfa(X,Z),

where PA = I - PA. We now describe several options for descent directions. Independent scaling. The first option is generating independent SSD directions for

the functions fo,(x) and f, (., z):

(60) hx= -XPAXXVxf,(X, z) = -XPAx(aXz -e),

hz= -ZPAZ-1ZVzfc,(x,z) = -ZPAZ-1(aZx-e).

Here hx is obtained by scaling the primal problem about x, and hz by scaling the dual problem about z (see (55)).

Although these may be the best possible directions (they are the Newton directions for each problem separately), using them for centering with a fixed a is not reasonable for two reasons. First because projections into two different spaces must be computed, doubling the work per iteration in comparison to primal only centralization. The second reason is more striking: the primal and dual centralization problems are totally uncou- pled in the sense that the sequence of primal solutions (xk) generated by the SSD method with any reasonable line search (fixed steplength, fixed percentage to the boundary or one-dimensional minimization) will be totally independent of the dual variables (zk). This is so because variations in z affect fQ,, ( Z) by a constant value. In fact, it is easy to see that hx in (60) is not affected by z since PAXXZ = PAXXC by Lemma 3.1. If (xk) approaches the central point x(a), then a good approximation z(xk, a) to z(a) is available from (42), and the dual iterations are useless. This may not be true in potential reduction methods or if a is allowed to change during the line searches.

Joint scaling. Duality is preserved if the problems are jointly scaled by a positive diagonal matrix D, with x = Dx and z = DQz, as in (55). The steepest descent directions will be

(61) hx = -DPADDVxf,(x, z) = -DPADD(az -x-1),

= -D-1PADD-1Vzf,(x, z) = -D-1 PADD-1(ax Z-1).

Primal scaling. Using D = X, (X, z) is mapped in (e, Xz). The primal direction coincides with h(x, a), the NR direction for the primal problem, but there is no reason




to believe that h_ is efficient. The same projection operation is applied to two different vectors, with a great reduction in work as compared to independent scaling.

Primal-dual scaling. Nice results are obtained by the following scaling, known as primal-dual scaling. Set

D =X2Z-2.

Now x and z are mapped into D-1x = Dz = (ZX) 2 e. The SSD directions are

(62) hx = -DPAD(a(XZ)!e - (XZ) ie) hz = -D 1PAD(a(XZ)2e - (XZ)-2e).

Only one projection is needed, since the primal and dual scaled directions are, respectively, projection and orthogonal complement of the same vector. As we shall see in a moment, these are the Newton directions for the optimality condition aXZe - e = 0. They were used by Kojima, Mizuno and Yoshise [65], by Monteiro and Adler [89], and by most studies in primal-dual penalty and potential reduction methods.

With a -- oo, we obtain the primal-dual affine scaling directions

hx = -DPAD(XZ) e, hz = -D1PAD(XZ) e.

Monteiro, Adler, and Resende [88] obtained a polynomial algorithm by taking very short steps along these directions. This is presently the only polynomial-time interior point method that is not based on any auxiliary function.

Newton-Raphson directions. The optimality conditions for the centering problem are given in Lemma 8.1, and can be written in several different equivalent ways. Each different way of writing the condition will generate a different pair of Newton directions, studied by Ye [120].

aXZe-e = 0, aZe-X-le = 0, aXe-Z-le = 0,

(63) Ax = b, Ax = b, Ax = b,

ATw + z = ATWZ = c, A Tw + z = c.

Although the systems above are equivalent, their solutions by Newton's method behave differently. The last two equations in each system simply mean that the resulting directions hx and hz must be, respectively, primal and dual feasible.

(i) Primal-dual scaling. The Newton directions for the first system in (63) are given by

Zhx + Xhz = -XZe +-e, a Ahx = 0,

hz ATy.

This system is solved by the directions in (62), as can be easily checked. Instead of check- ing the given solution, the system can be solved by a change of scale and projection, as we shall do in the next case.




(ii) Primal scaling. The Newton equations for the second system in (63) are

ah_? + X 2h, = -az+x-1,

Ahx = 0,

hz = ATY.

The system is solved by scaling: setting x = Xx, z = X-1z and simplifying,

ahz+hx = -a,z+e,

hx E AJ(AX),

hz I AJ(AX).

Computing now the projection into Nr(AX) and its orthogonal complement, with PAXhx = hx and PAXhZ = hz,

hx =P-Px(aX - e), ahz =-PAX(aZ - e),

or finally,

(64) hx = -XPAX(aXZ - e), ahz = -X1PAX(aXZ - e).

The direction hx coincides with the primal SSD direction in (60), but hz does not coincide with the corresponding direction in (59).

(iii) Dual scaling. A similar treatment applied to the third system in (63) leads to

(65) ahx = -Z1PAZ-1(aZx - e), hz =-ZPAZ-1(aZx - e).

Here hz coincides with the dual SSD direction in (61). These directions were classified by Ye [120], and no comparative study is available

yet. Primal-dual scaling was implemented by McShane, Monma, and Shanno [69], who report excellent results.

The path-following algorithms. Path-following methods are constructed in the usual way.

ALGORITHM 8.2. Implementable primal-dual path-following: given E E (0, 0.5], 3 E (0, 1), an admissible initialparameter a0o, xo E So and zo E ZO such that 6(xO, z?, ao) < E k := 0.

REPEAT ak+1 := ak/f. Call the SSD or Newton-Raphson algorithm to find a feasible pair (xk+l, zk+l) such that 6(x k+1, zk+ , ak+l) <E. k := k + 1.

UML ZkT xk < 2-L

Here the internal algorithm can use any of the directions described above. If primal- dual scaling is used, then short steps lead to a complexity of O(ViiL) internal iterations.

The proximity to the central path. We defined two measures of proximity from an interior point to a centralpoint, namely 6(x, a) and 6(x, z, a), associated, respectively, to




the primal and primal-dual central paths. The proximity to the centralpath (in contrast to centralpoint) can be defined in both cases as

6(x) = min 6(x, a),

6(x, z) = min 6(x, z, a). a

The solution of each of these problems defines a parameter value a such that x(a) or (x(a), z(a)) is the "nearest" central point in relation to x or (x, z), respectively.

The computation of 6(x) was made in [33], resulting in

T a CpTep

& llCp112'

where cP = PAXXC and ep = PAxe. The scaled SSD direction corresponding to this value of a is interesting: it is the centering direction used by Barnes, Jensen, and Chopra [8], the SSD direction for the centering problem restricted to a constant-cost slice.

The primal-dual 6(x, z) was used by Kojima, Mizuno, and Yoshise [65] and by Mon- teiro and Adler [89]. The computation is very simple in this case:

6(x, z)2 = min IIaZXe - el2 a

Differentiating with respect to a and equating to zero, we obtain

1 xTz a n

This is used in algorithms as follows: at each path-following iteration, instead of setting ak+1 := ak/fl as in Table 1, set

in ak+1 = k -

A non-path-following algorithm (possibly not polynomial) is obtained by using this rule before all SSD iterations, instead of keeping a constant until a nearly central point is obtained.

9. Potential reduction methods. Potential reduction algorithms work either with the primal potential function defined for lower bounds v < v and for a constant q > n:

(66) x E S0 - fV(x) = qlog(cTx - v) +p(x),

or with the primal-dual potential function

(67) x E S?, Z E Z? Fq(X, Z) = q 1og xTz + P(X) + P(Z).

We shall classify the methods in three types: (i) primal methods, based in (66); (ii) primal methods that use (67) in the convergence analysis or in updating lower

bounds, but keep no memory of dual variables; (iii) and primal-dual methods. The first primal potential reduction method is Karmarkar's algorithm. With Todd

and Burrell's lower bound updates it is a non-path-following algorithm, described in




?3.6. It is based on projective geometry and was the last one to have the complexity reduced to O(V/fiL) iterations. We comment on this below.

Primal affine polynomial methods were first described in [41] with the following result: the SSD algorithm applied to the potential function with q > n + Vf;i, q = O(n), and v = vi solves the linear programming problem with the same complexity as Kar- markar's algorithm. Ye [118] showed how to use approach (ii) to obtain methods with a complexity of 0(vf/iL) iterations. He developed a beautiful complexity analysis that we shall briefly describe in ?9.2. Although the first version of his method uses short steps, it is easy to extend his analysis to "not-so-short" and large steps with conclusions similar to what we obtained for penalty function methods. Further developments of Ye's approach were made by Freund [19], by Anstreicher and Bosch [7], and by Ye [120]. Primal only methods were extended to large steps in [38], in an approach to be commented below.

Ye's method is a path-following method in our definition, since it only updates the lower bounds (or dual variables) at nearly central points. But his analysis opens the way to real primal-dual methods that work directly with the primal-dual potential function, keeping the low complexity without resorting to the central path. Such methods, developed by Kojima, Mizuno, and Yoshise [63] and by Gonzaga and Todd [42] will be commented in ?9.3.

9.1. Primal only methods. Algorithm 7.1 will be specialized here for the potential function. There will be a change in notation: the iteration counter k will be incremented at each internal iteration, and not when a nearly central point is reached. The algorithm below is nothing but Algorithm 7.1 in a slightly different presentation.

ALGORITHM 9.1. Potential reduction path-following: given e E (0, 0.5], 3 E (0, 1), vo < v, and xo E S?.

k := 0.

REPEAT Direction: compute

(c x V h(Xk,vk)k V( ,) = Xkh(x Vk),

6(Xk,vk) I h(xk,vk)II

Line search: xk+1 := xk + Ah(xk, Vk). Update: If 6(xk, Vk) < e then

Vk+1 := 3Vk + (1 -/3))cTxk else vk+1 := Vk.

k := k + 1. UNTIL cTxk Vk<2L.

Let Vk = v be the lower bound at an iteration in which the condition 6(x, v) < e is satisfied. The lower bound update will be

(CT X- V) = 3(CT X- V).

Here / is not free, because v' < v is needed. An easy solution to this is provided by Lemma 5.1: make sure that

/3(CTX _ VI) > A(X, V).




Using (43) with eTh < V/nI h ll = /W,

A(X,V) < n

(CTX-V). q

The condition above is ensured by setting

(68) >n+ n q

The choice of q. Gonzaga [38] suggests q > 2n. The reason for this is in the condition number of V2f, (x) near a central point x(v) = e (after scaling). If this number approaches 1, then SSD approaches the behavior of Newton's method, with two consequences: from the practical point of view, centering should be more efficient; from the theoretical point of view, the good conditioning allows the extension of the conclusions in Lemma 5.4 about proximity. Choosing q and 3 so that 3q > 2n, the condition number of the Hessian near a central point will be better than 2, and the proximity becomes well defined. Now for well chosen values of E, the path-following algorithm achieves a complexity of O(ViiL) iterations whenever 1/f, = 1 + O(1/ /ii), as we had for the penalty function methods.

Large steps with complexity of O(nL) iterations are obtained by updating v at all iterations (not necessarily near central points), as we did in Karmarkar's algorithm:

If v (xk) > vk then set Vk+1 :i= v(xk) Else set Vk+1 := Vk.

9.2. Ye's approach. The primal-dual potential function was first studied by Todd and Ye [110], and an O(v/ihL)-iteration algorithm was developed by Ye [118]. The function has very nice features that we describe now.

LEMMA 9.2. Consider the primal-dual potential function (67) with q=n, and let (xz) be an interiorfeasible primal-dualpair. Then Fn(x, z) > n log n, and equality occurs if and only if (x, z) is central.

Proof. Let (x, z) be an interior feasible pair. n

Fn(x, z) = n log xTz - log xizi i=l

xTz = nlog n z

= nlogn + nlog (Z>xizi)/n (Hn xiz)1/n'

The fraction is the relation between the arithmetic and geometric means of the positive numbers xizi. This is well known to be greater than or equal to 1, with equality occurring if and only if for some 0 > 0, xizi = 9, i = 1, 2,. , n. But this is precisely the centrality condition in Lemma 8.1, completing the proof. 0

Now consider a sequence of primal-dual pairs (xk, zk). We shall study sequences with the following characteristics for q > n:

(i) Fq(x0, zo) = O((q - n)L). (ii) For k = 0,1,2,..., Fq(xk+l,zk+l) Fq(Xk,Zk) - 0, where 0 = 0(1) is a

constant. LEMMA 9.3. In a sequence (xk, zk) with the characteristics above, the condition

xk Zk < 2 L is reachedfor k = O((q-n)L).




Proof. For a sequence (Xk, zk) with these characteristics and k > 0,

Fq (xk, zk) < Fq (x, zo) - kO.

Using the definition of the potential function,

(q-n) lognkZk Fn(Xk, Zk) < Fq (x?,z) - kO.

From Lemma 9.2, Fn(xk, zk) > n log n > 0, and hence

(q - n) log xkTzk < F (xo zo) - kO.

The condition log xkTzk < -L is ensured for Fq(xO, z?) - kO < -(q - n)L, or kO > Fq(xO, z?) + (q - n)L = O((q - n)L), completing the proof. 0

We conclude that any algorithm that generates sequences with the characteristics

above reaches the stopping condition xTz < 2-L in O(q - n)L iterations. If q = n +

O(vf/), then the algorithm complexity will be of O(VfiL) iterations. The condition (i) is not strong. It is satisfied for instance if x? is nearly central for

a penalty parameter a = O(V/i2-L). An algorithm that allows f(xO, z?) = 0(2L) and

still has O(VfliL)-iteration complexity was reported by Mizuno [80].

The algorithm. A first version of Ye's algorithm is identical with Algorithm 9.1

above, with the following updating rule:

Update: If 6(xk, Vk) < e then Vk+1 := v(xk, Vk) (see(42))

else vk+1 := Vk. The gap reduction is

T k k n-eTh(xkVk) CTXk C X _Vk+l = A(X , Vk)= V ( x )

similar to Algorithm 9.1. A second version of Ye's algorithm is equivalent to the first one, but uses dual slacks

explicitly. Note that zk+l does not depend on Zk.

ALGORITHM 9.4. Potential reduction path-following, Ye [118]: given e E (0, 0.4], xo E so, zo E ZO.

k := 0. REPEAT

Direction: compute

h := -PAXk( T XkZ -e)

h Xkh, 6 := llhll.

Line search: xk+l := Xk + Ah. Update: If 6 < e then

zk+l Z(Xk, a) = Xk - a zkTxk

else zk+l = zk k := k + 1.

UNTIL zk xk < 2L




It is easy to see that this version is equivalent to the first, since for any x E SO, xTz(xk, a) = cTxk - v(xk, a). The dual update keeps no memory of zk, and thus the algorithm should be seen as a primal method.

The dual update uses the construction in (42):

zk+l z(xk a) = X (1 (CP + ? )

where Cp = PAXkXkC and ep = PAXhe. A deeper update can be obtained by setting zk+l := g(Xk), as defined in (26). The gap associated to this slack will be A(xk) -

min{xkT z(xk, a) I a > 0, z(xk, a) > O}, as in (44). This update, the deepest possible with dual variables in the format discussed in ?3.5, produces useless dual slacks, since z(xk) ? ZO. Instead of looking for the minimum gap using dual variables in this format, Ye minimized the primal-dual potential function, obtaining the final version of his algorithm:

Update: If 6 < e then zk+l := argmin{Fq(Xk, Z) I Z = X-71(Cp + i(e - ep)), / E 1, z > O}

z else zk+l = zk

Complexity. Ye proves that if q > n + Hnii, then both primal and dual iterations produce an improvement in the potential function better than a constant. Lemma 9.3 is then used to prove a complexity of O(vf/iL) iterations for q = n+ O(# ). If q = n+ v'ni, as in Ye's original paper, the algorithm will evolve by short steps-the "largest short steps" with the enhanced lower bound computations.

No way of making the SSD algorithm take a single iteration between dual updates for reasonable changes in v has been found up to now. This is due to the bad conditioning of the potential function.

9.3. Primal-dual algorithms. Lemmas 9.2 and 9.3 pave the way for primal-dual algorithms. Instead of taking dual updates in the format z = X (cp + iA(e - ep)), one can actually do primal-dual iterations to reduce Fq(x, z).

Any of the primal-dual directions studied in ?8 for penalty function methods can be immediately adapted to potential reduction algorithms simply by setting

q a- T xkTzk

Two approaches have been used: (i) Kojima, Mizuno, and Yoshise [63] used the primal-dual scaling to generate from

(x, z) the direction (hx, hz) in (62). (ii) Gonzaga and Todd [42] use the independent SSD directions (60), but not simul-

taneously. The algorithm performs primal iterations while xk is not nearly central, i.e., IIXkhx I > e. If this condition fails, the algorithm switches to dual, and stays dual while a similar condition stands for the dual points. It is proved that z E ZO and x E SO cannot be simultaneously nearly central for q > n + v;Hi, and this drives the algorithm. The convergence proof is trivial, since a sound descent is guaranteed at primal or dual points that are not nearly central, and Lemma 8.2 can be used. Note that the method can switch between primal and dual arbitrarily, as long as it avoids nearly central points.

Line searches. In primal-dual methods each iteration computes a primal-dual direction (hx, hz) and does a line search to reduce Fq (xk + Axhx, Zk + AZ hz). If we follow




the direction in R2n , then A. = Az, and the line search is just a normal one. But it seems convenient to use the extra degree of freedom and perform a bidirectional search varying A. and A_ independently. An example where bidirectional search is much more efficient was described by Ye [120].

These primal-dual methods obtain a complexity of O(V/niL) iterations for q = n + O(V/ii). But they are not path-following algorithms. The first one does not mention the central path; the second one avoids jamming near the path, and makes no attempt to approach it.

This suggests an interesting interpretation for the features that ensure low complexity: a good usage of primal-dual properties and careful steps. Staying near the central path is not necessary, but there nice dual slacks are found. In primal-only methods the vicinity of the central path is the only region where we know how to find these slacks.

9.4. Projective algorithms. The first projective algorithm to achieve a proved O(VhiL)-iteration complexity was published by Ye [119], using an approach similar to the one in ?9.2 above.

Much earlier, Anstreicher [4] described a different way of generating lower bounds for projective algorithms, based on the minimization of a fractional cost in the ball that circumscribes the unit simplex. Xiao and Goldfarb [117] revisited Anstreicher's algorithm five years later, and proved that it could be tuned into a short steps path-following algorithm with O(VfiiL) complexity.

10. Concluding remarks. This paper was limited to the description of the basic ideas used in path-following algorithms, with results and methods that can already be considered stable. The field of interior point methods is still extremely active, with new results appearing in a continuous stream. In this section we give an incomplete guide to the literature on subjects that we left out, and on extensions to nonlinear problems.

Initialization and termination. An initial feasible solution can be obtained by the introduction of an extra variable w in the problem, and by writing the constraints as Ax - w(Ax -b) = b with w? = 1 and an arbitrary x? > 0. Now the vector (x?, w?) is feasible, and w must be driven to zero to find a feasible solution for the original problem.

The classical approach associates to w a large cost M (big-M method). The algorithms start by a phase 1, that terminates when w is reduced to zero. This is used for instance by Adler, Karmarkar, Resende, and Veiga [2].

Phase 1-phase 2 methods are methods that simultaneously seek feasibility and optimality. The first such method was developed by de Gellinck and Vial [25] in the projective framework. Methods that work like optimization with multiple criteria were described by Anstreicher [5] and by Todd and Wang [109] for projective methods and by Anstreicher [5] using affine potential reduction methods. Freund [21], [20] used a shifted barrier approach that allows the initialization from a point that satisfies only the equality constraints, but can have negative components.

Termination of the algorithm requires finding a vertex solution from a near optimal solution. The use of a purification algorithm like in Kortanek [66] amounts to doing simplex-like operations, which may be inefficient. Mehrotra [74] solves the problem by a random perturbation of the cost vector.

Properties of the trajectories. Geometrical and analytical properties of the central path were studied by Bayer and Lagarias [10] and by Megiddo [72] very early, and their studies were basic to the creation of path-following methods. The trajectories generated by the vector fields of various algorithms are studied by Megiddo and Shub [73] (affine- scaling), Adler and Monteiro [1] (affine-scaling), Monteiro [85] (projective scaling) and




Monteiro [86] (affine potential reduction). See also Witzgall, Boggs, and Domich [116]. Homotopy continuation methods were studied for instance by Nazareth [92], Kojima, Megiddo, and Noma [61], and by Sonnevend and Stoer [105].

All methods studied in this paper are first-order methods. The direction followed by all algorithms coincide at a point on the central path: all methods then take the direction tangent to the path. Monteiro, Adler, and Resende [88] modified the primal-dual affine scaling algorithm to use power series extensions, with a consequent reduction in the number of iterations (and an increase in the time per iteration). A similar approach was taken by Mehrotra [75] and implemented by himself and by Lustig, Marsten, and Shanno [68], with very good results. Higher-order methods were also tested by Jarre [53].

Quadratic programming. In this paragraph we briefly comment on quadratic programming methods not based on solutions of the Linear Complementarity Problem, to be discussed below.

The first proof of polynomial solvability of convex quadratic problems was by Kozlov, Tarasov, and Khachiyan [67], based on the ellipsoidal method. Interior point methods for solving linearly constrained quadratic problems were first used by Kapoor and Vaidya [54] and independently by Ye and Tse [122]. These methods did not follow the central path.

Path-following methods with a complexity of O(Vf/iL) NR iterations were developed by Mehrotra and Sun [76], Goldfarb and Liu [31], Monteiro and Adler [90], Ben Daya and Shetty [11] and recently by den Hertog, Roos, and Terlaky [46], the first to allow large steps. Quadratically constrained problems were solved by Jarre [52] and later but independently by Mehrotra and Sun [78].

Linear complementarity problems. The first O(V/iiL)-iteration algorithm for solving the LCP with a positive semi-definite matrix was developed by Kojima, Mizuno, and Yoshise [64], who also show how to obtain a bound of O(n3L) arithmetical computations. Their algorithm is a path-following method based on the penalty function approach. The same authors [63] later developed a potential reduction algorithm with the same complexity. Variants of these two approaches were published by Mizuno [81], [80], Mizuno, Yoshise, and Kikuchi [83], and Mizuno and Todd [82], who derived a long steps method. Kojima, Megiddo, Noma, and Yoshise [62] published an extensive study of interior point methods for LCPs, where they present a unified theory and survey the literature.

Convex programming problems. Path-following algorithms for convex programming problems that are polynomial in a certain sense were developed independently by Monteiro and Adler [87], by Jarre [53] and by Mehrotra and Sun [77]. Contribu- tions were made by Nesterov and Nemirovsky [93], Sonnevend [104], and by Sonnevend and Stoer [105]. Den Hertog, Roos, and Terlaky [47] developed an algorithm related to Jarre's approach, and recently the same authors [45] described a very simple large steps penalty function method. All these methods rely on a certain Lipschitz condition on the second derivatives of the objective function.

Algorithms for nondifferentiable convex optimization were studied by Goffin [27], [28], Goffin and Vial [30], and Goffin, Haurie, and Vial [29]. Vaidya [113] proved a low polynomial bound for the convex feasibility problem using a cutting-plane algorithm that does not depend on any Lipschitz condition.

Combinatorial problems. The application of interior point ideas to combinatorial problems is still very limited. Branch and cut methods have been very successful in the solution of integer programming problems (see for instance Grotschel and Holland [43]), using a simplex algorithm for solving the linear relaxations. Mitchell and Todd [79] applied Karmarkar's algorithm in the solution of the relaxations, but the approach has a




serious difficulty: each time a new cut is found, the present iterate becomes infeasible and an efficient initialization of the interior point algorithm is difficult. New results on this topic should appear in the future, based on the ongoing research on initialization commented above.

A totally different approach to combinatorial problems has been taken by Karmarkar, Resende, and Ramakrishnan [56], based on the application of a trust region method to a potential function associated to the zero-one feasibility problem.

We would still like to cite some general treatments and surveys. Goldfarb and Todd [32] published an authoritative presentation of linear programming from its beginning to the first path-following methods. Den Hertog and Roos [44] went through the careful work of classifying all directions used in interior point methods for linear programming. Nesterov and Nemirovsky [94] wrote a book on their "self-concordant functions," which abstract the mathematical properties that lead to polynomial methods. This is a general class of functions for which properties like the ones in Lemma 5.4 can be derived. The use of these functions allows very elegant treatments of central path algorithms and their extensions to nonlinear problems.

REFERENCES

[1] I. ADLER AND R. MONTEIRO, Limiting behaviour of the affine scaling continuous trajectories for linear programmingproblems, Math. Programming, 50 (1991), pp. 29-51.

[2] I. ADLER, M. RESENDE, G. VEIGA, AND N. KARMARKAR,An implementation of Karmarkar's algorithm for linearprogramming, Math. Programming, 44 (1989), pp. 297-335. Errata in Math. Programming, 50 (1991), p. 415.

[3] K ANsTREicHER,Analysis of a modified Karmarkaralgorithm forlinearprogramming, Tech. Rep. Series B 84, Yale School of Organization and Management, New Haven, CT, 1985.

[4] ,A monotonic projective algorithm forfractional linearprogramming, Algorithmica, 1 (1986), pp. 483-498.

[5] , A combined phase I - phase II scaled potential algorithm for linear programming, CORE discussion paper 8939, CORE, Universit6 Catholique de Louvain, Louvain, Belgium, 1989, Math. Programming, to appear.

[6] , On long step path following and SUMTfor linear and quadratic programming, Dept. of Opera- tions Research, Yale University, New Haven, CT, 1990, manuscript.

[7] K. ANSTREICHERAND R. BOSCH, Long steps in a O(n3L) algorithm for linearprogramming, Yale School of Organization and Management, New Haven, CT, 1989, manuscript; Math. Programming, to appear.

[8] E. BARNES, D. JENSEN, AND CHoPRA, A polynomial-time version of the affine-scaling algorithm, New York University, New York, NY, 1988, manuscript.

[9] E. R. BARNES, A variation on Karmarkar's algorithm for solving linear programming problems, Math. Programming, 36 (1986), pp. 174-182.

[10] D. BAYER AND J. C. LAGARiAS, The non-linear geometry of linear programming, i. affine and projective scaling trajectories, ii. legendre transform coordinates, iii. central trajectories, AT&T Bell Laborato- ries, Murray Hill, NJ, 1986, preprint.

[11] M. BEN DAYA, AND C. M. SHETTY, Polynomial barrierffunction algorithms for convex quadratic programming, Arabian J. Sci. Engrg., 15 (1990), pp. 657-670.

[12] R. BLAND, D. GOLDFARB, AND M. TODD, The ellipsoid method: A survey, Oper. Res., 29 (1981), pp. 1039-1091.

[13] K. BORGWARDT, The Simplex Method: A ProbabilisticAnalysis, Springer-Verlag, Berlin, 1987. [14] G. DANTZIG, Maximization of a linearfunction of variables subject to linear inequalities, in Activity Anal-

ysis of Production and Allocation, T C. Koopmans, ed., John Wiley, New York, 1951, pp.339-347. [15] G. B. DANrZIG, Linear Programming and Extensions, Princeton University Press, Princeton, NJ, 1963. [16] I. I. DIKIN, Iterative solution of problems of linear and quadratic programming, Soviet Math. Dokl., 8

(1967), pp. 674-675. [17] , On the speed of an iterative process, Upravlyaemye Sistemy, 12 (1974), pp. 54-60.




[18] A. FIAcco AND G. MCCORMICK, NonlinearProgramming: Sequential Unconstrained Minimization Tech- niques, John Wiley, New York, 1968.

[19] R. M. FREUND, Polynomial-time algorithms for linear programming based only on primal scaling and projected gradients of a potential function, Math. Programming, 51 (1991), pp. 203-222.

[20] , A potential-function reduction algorithm for solving a linear program directly from an infeasible 'warm start,' Tech. Rep. 3079-89-MS, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, 1989.

[21] , Theoretical efficiency of a shifted barrierfunction algorithm for linearprogramming, Linear Alge- bra Appl., 152 (1991), pp. 19-41.

[22] K. R. FRISCH, The logarithmic potential method of convex programming, University Institute of Eco- nomics, memorandum, Oslo, Norway, 1955.

[23] P. GAcs AND L. LovAsz, Khachiyan's algorithm for linear programming, Math. Programming Study, 14 (1981), pp. 61-68.

[24] D. GAY, A variant of Karnarkar's linear programming algorithm for problems in standard form, Math. Programming, 37 (1987), pp. 81-89.

[25] G. D. GHELLINCK AND J.-P. ViAL,A polynomial Newton methodfor linearprogramming, Algorithmica, 1 (1986), pp. 425-453.

[26] P. GILL, W MURRAY, M. SAUNDERS, J. TOMLIN, AND M. WRIGHT, On projected Newton barier methods for linearprogramming and an equivalence to Karmarkar's projective method, Math. Programming, 36 (1986), pp. 183-209.

[27] J.-L. GoFFN,Affine andprojective methods in nondifferentiable optimization, in Trends in Mathematical Optimization: Proceedings of the 4th French-German Conference on Optimization in Irsee, West Germany, April 1986, K. H. Hoffmann, J. B. Hiriat-Urruty, C. Lemarechal, and J. Zowe, eds., Internat. Ser. Numer. Math., 84 Birkhauser Verlag, Basel, Switzerland, 1988, pp. 79-91.

[28] , Affine and projective transformations in nondifferentiable optimization, in Trends in Mathemat- ical Optimization, ISNM, Vol. 84, K.-H. Hoffmann, J.-B Hiriart-Urruty, C. Lemarechal, and J. Zowe, eds., Birkauser Verlag, Basel, 1988, pp. 79-91.

[29] J.-L. GOFFIN, A. HAURIE, AND J.-P. VLAL, Decomposition and nondifferentiable optimization with the projective algorithm, Tech. Rep. G-89-25, GERAD, Ecole des Hautes Etudes Commerciales, Uni- versite McGill, Montreal, Canada, 1989.

[30] J.-L. GOFFIN AND J.-P. ViAL, Cutting planes and column generation techniques with the projective algorithm, J. Optim. Theory Appl., 65 (1990), pp. 409-429.

[31] D. GOLDFARB AND S. LIu,An O(n3 L) primal interiorpoint algorithm for convex quadratic programming, Math. Programming, 49 (1990/91), pp. 325-340.

[32] D. GOLDFARBAND M. TODD, Linearprogramming, in Handbooks in Operations Research and Manage- ment Science, Vol. I: Optimization, G. Nemhauser, A. R. Kan and M. Todd, eds., North-Holland, Amsterdam, 1989, Chap. 2.

[33] C. GONZAGA, Search directions for interior linear programming methods, Memorandum, UCB/ERL M87/44, Electronics Research Laboratory, University of California, Berkeley, CA, March 1987.

[34] , Conicalprojection algorithms for linearprogramming, Math. Programming, 43 (1988), pp. 151- 173.

[35] , Interiorpoint algorithms for linearprogramming problems with inequality constraints, Math. Pro- gramming, 52 (1991), pp. 209-226.

[36] , An algorithm for solving linear programming problems in O(n3L) operations, in Progress in Mathematical Programming-Interior Point and Related Methods, N. Megiddo, ed., Springer- Verlag, Berlin, 1989, Chap. 1.

[37] , Large step path-following methods for linearprogramming, Part I: Barrierfunction method, SLAM J. Optim., 1 (1991), pp. 268-279.

[38] , Large step path-following methods for linear programming, Part II: Potential reduction method, SIAM J. Optim., 1 (1991), pp. 280-292.

[39] , Convergence of the large steps primal affine scaling algorithm for primal non-degenerate linear programming, internal report, COPPE - Federal University of Rio de Janeiro, Rio de Janeiro, Brasil, 1990.

[40] , On lower bound updates in primal potential reduction methods for linear programming, Math. Programming, 52 (1991), pp. 415-428.

[41] , Polynomial affine algorithms for linear programming, Math. Programming, 49 (1990), pp. 7-21. [42] C. GONZAGA AND M. J. TODD, An O(.;friL)- iteration large-step primal-dual affine algorithm for linear

programming, Tech. Rep. 862, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1989, SIAM J. Optim., 2 (1992), to appear.




[43] M. GROTCHEL AND 0. HOLLAND, Solving matching problems witli linearprogramming, Math. Program- ming, 33 (1985), pp. 243-259.

[44] D. D. HERTOG AND C. Roos, A survey of search directions in interior point methods for linear programming, Report 89-65, Faculty of Technical Mathematics and Informatics, Technische Universiteit Delft, Holland, 1989.

[45] D. D. HERTOG, C. Roos, AND T. TERLAKY, On the classical logarithmic barrier function method for a class of smooth convex programming problems, Report 90-28, Faculty of Technical Mathematics and Informatics, Technische Universiteit Delft, Holland, 1990.

[46] , A polynomial method of weighted centers for convex quadratic programming, J. Inform Optim. Sci., 12 (1991), pp. 187-205.

[47] , A potential reduction method for a class of smooth convex programming problems, Report 90-01, Faculty of Technical Mathematics and Informatics, Technische Universiteit Delft, Holland, 1990.

[48] , A potential reduction variant of Renegar's short-step path-following method for linear programming, Linear Algebra Appl., 152 (1991), pp. 43-68.

[49] D. D. HERTOG, C. Roos, AND J.-P. VIA, Polynomial-time long-steps algorithms for linear programming based on the use of the logarithmic barrier function, Faculty of Technical Mathematics and Infor- matics, Technische Universiteit Delft, Holland, 1990, manuscript.

[50] P. HuARD, Resolution of mathematicalprogramming with nonlinear constraints by the method of centers, in Nonlinear Programming, J. Abadie, ed., North-Holland, Amsterdam, 1967.

[51] H. IMAI, On the convexity of the multiplicative version of Karmnarkar's potentialfunction, Math. Program- ming, 40 (1988), pp. 29-32.

[52] F. JARRE, On the convergence of the method of analytic centers when applied to convex quadratic programs, Math. Programming, 49 (1990/91), pp. 341-358.

[53] , The Method of Analytic Centers for Smooth Convex Programs, Grottenthaler Verlag, Bamberg, 1989.

[54] S. KAPOOR AND P. VAIDYA, Fast algorithms for convex quadratic programming, J. of the ACM, (1986), pp. 147-159.

[55] N. KARMARKAR, A new polynomial time algorithm for linearprogramming, Combinatorica, 4 (1984), pp. 373-395.

[56] N. KARMARKAR, M. RESENDE, AND K. RAMAKRISHNAN,An interiorpoint algorithm to solve computationally difficult set covering problems, AT&T Bell Laboratories, Murray Hill, NJ, 1989, manuscript.

[57] L. KHACHIYAN AND M. J. TODD, On the complexity of approximating the maximal inscribed ellipsoid for a polytope, Tech. Rep. 893, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1990.

[58] L. G. KHACHIYAN, A polynomial algorithm for linear programming, Soviet Math. Dokl., 20 (1979), pp. 191-194.

[59] , Polynomial algorithms in linear programming, USSR Computaional Mathematics and Math. Phys., 20 (1980), pp. 53-72.

[60] V. KLEE AND G. MiNTY, How good is the simplex algorithm?, in Inequalities III, 0. Sisha, ed., Academic Press, New York, NY, 1972.

[61] M. KOJIMA, N. MEGIDDO, AND T. NOMA, Homotopy continuation methods for complemetarity problems, Res. Rep. RJ6638 (63949), IBM Research, Almaden Research Center, San Jose, CA, 1989.

[62] M. KoJIMA, N. MEGIDDO, T. NOMA, AND A. YOSHISE, A unified approach to interior point algorithms for linear complemetarity problems, Lecture Notes in Comput. Sci., 538, Springer-Verlag, Berlin, 1991.

[63] M. KoJIMA, S. MIZUNO, AND A. YOSHISE, An O(./;f?L) iteration potential deduction algorithm for linear complementarity problems, Math. Programming, 50 (1991), pp. 331-342.

[64] ,Apolynomial-time algorithm fora class of linearcomplementarity problems, Math. Programming, 44 (1989), pp. 1-26.

[65] , A primal-dual interior point method for linear programming, in Progress in Mathematical Pro- gramming - Interior Point and Related Methods, N. Megiddo, ed., Springer-Verlag, Berlin, 1989, Chap. 2.

[66] K. KORTANEK AND J. ZHU, New purification algorithms for linear programming, Naval Res. Logist., 35 (1988), pp. 571-583.

[67] M. KOZLOV, S. TARASOV, AND L. KHACHIYAN, Polynomial solvability of convex quadratic programming, Dokl. Acad. Nauk, SSSR, 5 (1979), pp. 1051-1053.

[68] I. LUSTIG, R. MARSTEN, AND D. SHANNo, On implementing Mehrotra's predictor-corrector interior point methodforlinearprogramming, Tech. Rep. SOR 90-03, Dept. of Civil Engineering and Operations Research, Princeton University, Princeton, NJ, 1990.




[69] K. MCSHANE, C. MoNMA, AND D. SHANNo, An implementation of a primal-dual interior point method for linear programming, ORSA J. Comput., 1 (1989), pp. 70-83.

[70] N. MEGIDDO, Linear programming in linear time when the dimension isfixed, J. Assoc. Comput. Mach., 31 (1984), pp. 114-127.

[71] , On the complexity of linearprogramming, in Advances in Economic Theory, T Bewley, ed., 1987, Cambridge University Press, Cambridge, MA, pp. 225-268.

[72] , Pathways to the optimal set in linearprogramming, in Progress in Mathematical Programming - Interior Point and Related Methods, N. Megiddo, ed., Springer-Verlag, Berlin, 1989, Chap. 8.

[73] N. MEGIDDOAND M. SHUB,Boundary behaviourof interiorpoint algorithms in linearprogramming, Math. Oper. Res., 14 (1989), pp. 97-146.

[74] S. MEHROTRA, On finding a vertex solution using interior point methods, Linear Algebra Appl., 152 (1991), pp. 233-253.

[75] , On the implementation of a (primal-dual) interior point method, Tech. Rep. 90-03, Dept. of IE/MS, Northwestern University, Evanston, IL, 1990.

[76] S. MEHROTRAAND J. SuN,An algorithm forconvexquadratic programming that requires O(n3.5L) arithmetic operations, Math. Oper. Res., 15 (1990), pp. 342-363.

[77] , An interior point algorithm for solving smooth convex programs based on Newton's method, in Mathematical Developments Axising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, June/July 1988, J. C. Lagarias and M. J. Todd., eds., Contemp. Math., 114, American Mathematical Society, Providence, RI, 1990, pp. 265-284.

[78] , A method of analytic centers for quadratically constrained quadratic problems, SIAM J. Numer. Anal., 28 (1991), pp. 155-168.

[79] J. E. MITCHELL AND M. J. TODD, Solving combinatorial optimization problems using Karmarkar's algorithm, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1990, manuscript.

[80] S. MIzuNo, O(nPL) iteration O(n3L) potential reduction algorithms linearprogramming, Linear Alge- bra Appl., 152 (1991), pp. 155-168.

[81] , A new polynomial time method for a linear complementarity problem, Tech. Rep. 16, Dept. of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo, Japan, 1989.

[82] S. MIzuNo AND M. TODD, An O(n3L) long step path following algorithm for a linear complementarity problem, Tech. Rep. 23, Dept. of Industrial Engineering and Management, Tokyo Institute of Technology, Tokyo, Japan, 1989.

[83] S. MIzuNo, A. YOSHISE, AND T. KIKucHI, Practicalpolynomial time algorithmsfor linear complementarity problems, J. Oper. Res. Soc. of Japan, 32 (1989), pp. 75-92.

[84] C. MONMAAND A. MORTON, Computational experience with a dual affine variant of Karnarkar's method for linear programming, Oper. Res. Lett., 6 (1987), pp. 271-267.

[85] R. MONTEIRO, Convergence and boundary behaviour of the projective scaling trajectories for linear programming, in Mathematical Developments Arising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, June/July 1988, J. C. Lagarias and M. J. Todd., eds., Contemp. Math., 114, American Mathematical Society, Prov- idence, RI, 1990, pp. 213-229.

[86] , On the continuous trajectories for a potential reduction algorithm for linearprogramming, AT&T Bell Laboratories, Holmdel, NJ, 1989, preprint.

[87] R. MONTEIRO AND I. ADLER, An extension of Karmarkar type algorithm to a class of convex separable programming problems with global linear rate of convergence, Math. Oper. Res., 15 (1990), pp. 408- 422.

[88] R. MONTEIRO, I. ADLER, AND M. RESENDE, A polynomial-time primal-dual affine scaling algorithm for linearand convex quadraticprogramming and itspower series extension, Math. Oper. Res., 15 (1990), pp. 191-214.

[89] R. C. MONTEIRO AND I. ADLER, Interior path-following primal-dual algorithms, part I: Linear programming, Math. Programming, 44 (1989), pp. 27-41.

[90] , Interior path-following primal-dual algorithms, part II: Convex quadratic programming, Math. Programming, 44 (1989), pp. 43-66.

[91] W MYLANDER, R. HOLMES, AND G. McCoRMIcK, A guide to SUMT-version 4: The computer program implementing the sequential unconstrained minimization technique for nonlinear programming, Re- search paper RAC-P-63, Research Analysis Corporation, McLean, VA, 1971.

[92] J. L. NAZARETH, Homotopy techniques in linearprogramming, Algorithmica, 1 (1986), pp. 529-535.




[93] Y. NESTEROV AND A. NEMIROVSKY, A general approach to polynomial-time algorithms design for convex programming, report, Central Economical and Mathematical Institute, USSR Academy of Sci- ences, Moscow, USSR, 1988.

[94] , Self-Concordant Functions and Polynomial Time Methods in Convex Programming, Moscow, USSR, 1989.

[95] E. PoLAK, Computational Methods in Optimization, Academic Press, New York, NY, 1971. [96] J. RENEGAR, A polynomial-time algorithm based on Newton's method for linear programming, Math.

Programming, 40 (1988), pp. 59-94. [97] J. RENEGARAND M. SHuB, Simplified complexityanalysisforNewton LPmethods, Tech. Rep.807, School

of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1988; Algo- rithmica, to appear.

[98] C. Roos AND J.-P. VIAL, A simple polynomial method of approximate centres for linear programming, report, Faculty of Technical Mathematics and Informatics, Technische Universiteit Delft, Holland, 1988.

[99] , Long steps with the logarithmic penalty barrier function in linear programming, in Economic Decision-Making: Games, Economics and Optimization, J. Gabszeywicz, J. F. Richard, and L. Wolsey, eds., Elsevier Science Publisher B. V., Amsterdam, 1989, pp. 433-441.

[100] A. SCHRUJVER, Theory of Linear and Integer Programming, John Wiley, New York, NY, 1986. [101] R. SHAMIR, The efficiency of the simplex method: A survey, Management Sci., 33 (1987), pp. 301-334. [102] N. SHOR, Utilization of the operation of space dilatation in the minimization of convex functions, Kiber-

netika, 1 (1970), pp. 6-12; English translation: Cybernetics, 6, pp. 7-15. [103] G. SoNNEvEND, An analytical centre for polyhedrons and new classes of global algorithms for linear

(smooth, convex) programming, in Lecture Notes Control Inform. Sci. 84, Springer-Verlag, New York, NY, 1985, pp. 866-876.

[104] , New algorithms in convex programming based on a notion of "centre" (for systems of analytic inequalities) and on rational extrapolation, in Trends in Mathematical Optimization, ISNM, Vol. 84, K.-H. Hoffmann, J.-B. Hiriart-Urruty, C. Lemarechal, and J. Zowe, eds., Birkauser Verlag, Basel, 1988, pp. 311-327.

[105] G. SONNEVEND AND J. STOER, Global ellipsoidal approximations and homotopy methods for solving convex analytic programs, Appl. Math. Optim., 21 (1990), pp. 139-166.

[106] A. STEGER, An extension of Karmarkar's algorithm for bounded linear programming problems, master's thesis, Dept. of Applied Mathematics, State University of New York, Stony Brook, NY, August 1985. Condensed version in Operations Research Proceedings 1987, H. Schellhaas, P. van Beek, H. Isermann, R. Schmidt and M. Zijlstra, eds., Springer-Verlag, Berlin, West Germany, 1988, pp. 88-95.

[107] E. TARDos,A strongly polynomialalgorithm to solve combinatorial linearprograms, Oper. Res. 35(1986), pp. 250-256.

[108] M. TODD AND B. BURRELL, An extension of Karmarkar's algorithm for linear programming using dual variables, Algorithmica, 1 (1986), pp. 409-424.

[109] M. J. TODD AND Y. WANG, On combined phase 1-Phase 2 projective methods for linear programming, Tech. Rep. 877, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 1989.

[110] M. J. TODD AND Y. YE, A centered projective algorithm for linear programming, Math. Oper. Res., 15 (1990), pp. 508-529.

[111] P. VAIDYA,An algorithm forlinearprogramming which requires O((m + n)n2 + (m + n)' 5nL), Math. Programming, 47 (1990), pp. 175-201.

[112] , A locally well-behaved potential function and a simple Newton-tipe method forfinding the center of a polytope, in Progress in Mathematical Programming - Interior Point and Related Methods, N. Megiddo, ed., Springer-Verlag, Berlin, 1989, Chap. 5.

[113] , A new algorithm for minimizing a convex function over convex sets, in Proceedings of the 30th Annual Symposium on Foundations of Computer Science, Research Tiangle Park, NC, 1989, IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 338-343.

[114] R. VANDERBEI AND J. C. LAGARIAS, I. Dikin's convergence results for the affine-scaling algorithm, in Mathematical Developments Arising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, June/July 1988, J. C. Lagarias and M. J. Todd, eds., Contemp. Math., 114, American Mathematical Society, Providence, RI, 1990, pp. 109-119.

[115] R. J. VANDERBEI, M. J. MEKETON, AND B. A. FREEDMAN,A modification of Karinarkar's linearprogramming algorithm, Algorithmica, 1 (1986), pp. 395-407.




[116] C. WITZGALL, P. BOGGS, AND and P. DOMICH, On center trajectories and their relatives in linearprogramming, technical report, National Bureau of Standards, Gaithersburg, MD, 1988.

[117] D. XIAO AND D. GOLDFARB, A path-following projective interior point method for linear programming, manuscript, Dept. of Industrial Engineering and Operations Research, Columbia University, New York, NY, 1990.

[118] Y. YE,An O(n3L)potential reduction algorithm forlinearprogramming, Math. Programming,50(1991), pp. 239-258.

[119] , A class of projective transformations for linear programming, SIAM J. Comput., 19 (1990), pp. 457-466.

[120] , Line search in potential reduction algorithms for linear programming, Dept. of Management Sciences, The University of Iowa, Iowa City, IA, 1989, manuscript.

[121] Y. YEAND M. KojiMA, Recovering optimal dual solutions in Karnarkar's polynomial algorithm for linear programming, Math. Programming, 39 (1987), pp. 305-317.

[122] Y. YE AND E. TSE, A polynomial-time algorithm for convex quadratic programming, Engineering- Economic System Dept., Stanford University, Stanford, CA, 1986, manuscript.

[123] D. YUDIN AND A. NEMIROVSKII, Informational complexity and efficient methods for the solution of convex extremalproblems, Ekon. i Mat. Metody, 12 (1976), pp. 357-369. (In Russian.) Matekon 13(2), pp. 3-25. (In English.)



Path-Following Methods for Linear Programming … · SIAM REVIEW ( 1992 Society for Industrial and...

Documents

Transcript of Path-Following Methods for Linear Programming … · SIAM REVIEW ( 1992 Society for Industrial and...