On degeneracy in linear programming and related problems

17
Annals of Operations Research 47(1993)343-359 343 On degeneracy in linear programming and related problems Karen George School of Information Sciences, University of Canberra, Canberra, Australia M.R. Osborne Mathematical Sciences School, Australian National University, Australia Methods related to Wolfe's recursive method for the resolution of degeneracy in linear programming are discussed, and a nonrecursive variant which works with prob- ability one suggested. Numerical results for both nondegenerate problems and problems constructed to have degenerate optima are reported. These are obtained using a careful implementation of the projected gradient algorithm [11]. These are compared with results obtained using a steepest descent approach which can be implemented by means of a closely related projected gradient method, and which is not affected by degeneracy in principle. However, the problem of correctly identifying degenerate active sets occurs with both algorithms. The numerical results favour the more standard projected gradient procedure which resolves the degeneracy explicitly. Extension of both methods to general polyhedral convex function minimiz- ation problems is sketched. Keywords: Recursive resolution, projected gradient algorithm, steepest descent method, generating model problems, Ii estimation, minimizing polyhedral convex functions. 1. Introduction Degeneracy in linear programming is an artefact of particular solution methodologies rather than an intrinsic difficulty with applications problems. It has its origin in a generic assumption of a simple structure for the vertices of the feasible region. It is a potential cause of difficulties (for example, cycling and stalling) when the generic assumption is invoked explicitly in order to generate a descent direction. In Wolfe [15] a method for resolving degeneracy in the simplex method of linear programming is described. At the time, it seems to have been considered to be of academic interest only despite Wolfe's explicit description of the (rather minimal) implementation needs. Presumably this was a result of the widespread © J.C. Baltzer AG, Science Publishers

Transcript of On degeneracy in linear programming and related problems

Annals of Operations Research 47(1993)343-359 343

On degeneracy in linear programming and related problems

Karen George

School of Information Sciences, University of Canberra, Canberra, Australia

M.R. Osborne

Mathematical Sciences School, Australian National University, Australia

Methods related to Wolfe's recursive method for the resolution of degeneracy in linear programming are discussed, and a nonrecursive variant which works with prob- ability one suggested. Numerical results for both nondegenerate problems and problems constructed to have degenerate optima are reported. These are obtained using a careful implementation of the projected gradient algorithm [11]. These are compared with results obtained using a steepest descent approach which can be implemented by means of a closely related projected gradient method, and which is not affected by degeneracy in principle. However, the problem of correctly identifying degenerate active sets occurs with both algorithms. The numerical results favour the more standard projected gradient procedure which resolves the degeneracy explicitly. Extension of both methods to general polyhedral convex function minimiz- ation problems is sketched.

Keywords: Recursive resolution, projected gradient algorithm, steepest descent method, generating model problems, Ii estimation, minimizing polyhedral convex functions.

1. Introduction

Degeneracy in linear p rogramming is an artefact of part icular solution methodologies ra ther than an intrinsic difficulty with applications problems. It has its origin in a generic assumption o f a simple structure for the vertices o f the feasible region. It is a potential cause of difficulties (for example, cycling and stalling) when the generic assumption is invoked explicitly in order to generate a descent direction.

In Wolfe [15] a me thod for resolving degeneracy in the simplex me thod o f linear p rogramming is described. At the time, it seems to have been considered to be o f academic interest only despite Wolfe's explicit description o f the (rather minimal) implementat ion needs. Presumably this was a result o f the widespread

© J.C. Baltzer AG, Science Publishers

344 K. George, M.R. Osborne~On degeneracy

belief that the problem it addressed was not one encountered in practice. References to its use in the years up to about 1985 are extremely sparse (one exception is [9]), A number of factors have conspired to change this situation. For example: degeneracy related problems were discovered to contribute substantially to the difficulty in using linear programming based methods in scheduling and related problems [14], certain practically derived data sets were discovered to lead to varieties of degener- ate behaviour under methods of data analysis based on linear programming [1], and linear programming formulations of certain data analysis problems were shown to lead to constraint sets every vertex of which is degenerate [12]. Equally important has been the more general realisation that the problem is not obscure and is capable of satisfactory algorithmic resolution [5, 11]. Appropriate implementation has been shown to enhance the problem solving capability of important packages [8].

This paper considers methods for the resolution of degeneracy in linear programming based on Wolfe's ideas in section 2. The key idea is a geometric one. In the presence of degeneracy, a descent direction can be constructed if and only if a certain reduced problem constructed using only the active constraints (the constraints involved in the degeneracy) is unbounded below. Then any unbounded direction for the reduced problem provides a direction which breaks the current degeneracy. These directions are independent of the constraint right hand sides, and this observation permits the development of suitable computing strategies. Typically these are recursive (at least conceptually) because degeneracy is considered a possibility in the reduced problem. Termination of the recursion is shown to be a consequence of the successive reduced problems being strictly smaller in size. Also we give a closely related, non-recursive procedure which generates a direction breaking the current degeneracy with probability one. An implementation of this method based on the projected gradient algorithm is exemplified in section 3 using randomly generated test problems. The idea here is that it is straightforward to generate test problems with degenerate optima. A steepest descent method is considered in section 4. This can be implemented in very similar manner to the projected gradient algorithm to which it is closely related, but it generates a descent direction without reliance on the generic vertex assumption, using a procedure which has successful termination guaranteed in exact arithmetic. A numerical comparison with the projected gradient algorithm is given also. Both methods can be applied to more general problems of the minimization of polyhedral convex functions. This is sketched in sections 5 and 6.

2. Recursive resolution of degeneracy

To explain the problem and Wolfe's solution it is perhaps simplest to consider the dual of the standard problem in the form

min cTx; X = {x : A x >_ b}, (2.1) x E X

K. George, M.R. Osborne~On degeneracy 345

where A : ~P ~ R n is assumed to have (full) rank p. Then the current x is a vertex of the feasible region if there exists an index set cr pointing to the rows of A~ such that

-> p, rank (A~) = p, and

Aax = b~, A ~ x > b~,

where Ao is the submatrix with rows indexed by a and a c = { 1 ,2 , - - - , n}\cr. Let x be a vertex. If the generic assumption = p is made then it is possible to test for optimality by computing u from

A~u = c. (2.2)

If u > 0 then the current x is an optimum. If not then (say) Uk < 0 so that

satisfies

t = A-alek (2.3)

T - T T cTt = ek A ~ A~u = u k < 0 (2.4)

and t h e j t h row of A~ satisfies

a~T(j)t = ~jk, j = 1,2,'--,1~1. (2.5)

Thus t provides a descent direction (edge) in which the a(k)th constraint becomes inactive while the other constraints active at x remain active. Thus o- ~ o'\{a(k)}. But if > p this simple picture becomes clouded. Selecting p of the active con- straints and proceeding to estimate u, t as before need not be satisfactory because a move in the direction defined by t could immediately violate one of the remaining constraints. Also, at an op t imum there is now no guarantee that any particular selection of p constraints need give a nonnegative u although the existence of a feasible multiplier vector is guaranteed. If x is not optimal, then what is needed in order to reduce the objective function is a direction t which satisfies the descent condition

eTt < 0 (2.6)

while preserving the feasibility of the active constraints

A~t >_ O. (2.7)

Remark 2.1

This pair of inequalities is fundamental to the resolution of the problems caused by degeneracy. Construction of t satisfying them, however disguised, is

346 K. George, M.R. Osborne~On degeneracy

the purpose of all resolution procedures. The key feature of Wolfe's method is that it seeks such a t directly.

If and only if a direction satisfying (2.6), (2.7) exists then it is a direction in which the problem

mincTw; W = {w : A~w > E} w E W

(2.8)

is unbounded below. The set of such directions is independent of e provided it is bounded. It follows that e < 0 can be chosen so that w = 0 defines a generic vertex for (2.8). If the basic solution procedure (for example, the projected gradient algorithm) is applied to (2.8) then the degeneracy is resolved if and only if an unbounded direction is detected. Otherwise, a bounded optimum is found for (2.8), and the current x is an optimum for (2.1). Because this approach replaces the problem with a related one of similar form, it is possible for the solution of the new problem also to encounter degeneracy. But the technique can be applied recursively until either a descent direction is produced or optimality is verified.

Termination of the recursive method is verified readily. Because e is chosen at each level so that w = 0 is a generic vertex, which means a constraint becomes inactive in the initial step in the resolution procedure, it follows that Io-01 > l a l [ . . . , where Icr/[ is the number of active constraints at recursive level j. Thus the number of levels cannot exceed [a01- p. Also, the standard termination argument that notes the objective function at the current level is reduced at each effective step ensures that active sets cannot repeat in the current subproblem. Thus the search for t is finite.

Implementation of the recursive method is straightforward. It can be imple- mented with the only additional storage required being a vector of length [cr0[ which holds the recursive depth at which each of the constraints pointed to by cr 0 remain

T involved. The ei can be stored on a~,(j)x - b~(j),j = 1,2 , - . - , 1 I, and these are reset to zero when the degeneracy at the current level is resolved.

But how necessary is this apparatus? In the experiments with highly degener- ate problems reported by Ryan and Osborne [12] recursive depths greater than one occurred only for contrived choices of e. This should be expected. Consider the first step of the recursive formulation. Further degeneracy can only occur if there is an index set u, u C cr, Ivl > p, rank A~, = p such that there is a feasible w satisfying

A~,w=e.~, (2.9)

for the particular choice of e made. As the image under A~, of any compact set in XP has zero volume in XI~,I it follows that, in exact arithmetic, the probability of satisfying (2.9) is zero for random choice of ~. Such a choice could be generated by sampling a uniform distribution on [0, 1] in order to select each -e i > 0, for example.

K. George, M.R . Osborne~On degeneracy 347

In the extremely remote case that (2.9) does hold then the degeneracy can be removed by setting

v=v~uv2, Iv~l=p, rank(A~,)=p,

ev2(i ) ~ - - e u 2 ( i ) - - random(> 0), i = 1 ,2 , . . . , Iv2[. (2.10)

Now v points to a generic vertex of a modified version of (2.8). But this modification has the same unbounded directions so the search can continue from w. As any sub- sequent degenerate vertex can involve at most (Ic, I - 1) active constraints it follows that repetition of a degenerate vertex requires more than one restart using (2.10). As each restart requires the occurrence of a rare event, and as an independent sample is made at each restart, the probability of repeating a degenerate vertex is the product of these rare event probabilities. It follows that cycling can occur only with probability zero. The degeneracy removal based on (2.10) is non-recurs ive because the same constraint matrix A, is used for each subproblem. Clearly there are quite close connections between the suggested procedure and informal methods that resolve degeneracy by introducing small random perturbations in order to make progress. But here there is the distinct advantage that E need only possess a reasonable bound.

Detection of degeneracy is just the problem of deciding that numerically approximately satisfied inequalities correspond to active constraints. This is not easy in the presence of rounding error. Gill et al. [8] propose to avoid this problem by insisting on a minimum step length even if this forces constraint violations, and argue that they terminate with the optimal solution of a nearby problem. The argument to show that they move correctly through degenerate vertices when these occur is very similar to ours.

3. Numerical experiments

In this section results of computations made using randomly generated test problems with known solutions are reported. In the nondegenerate case the tech- nique used to generate these problems is described in [11]. First a random selection of p numbers from 1, 2 , . . . , n , chosen without replacement, is made in order to generate the row pointers in the optimal d-. Then A, x, and u > 0 are set by sampling a random number generator followed by any desired shifting and scaling. The conditions on u guarantee a unique solution. The Kuhn-Tucker conditions give

T C = A o u , and the right hand side of the constraint inequalities is constructed from b = A x - E where ci = 0 if i E ~, and ci > 0 otherwise and can be chosen by any suitable random procedure. The argument of the previous section suggests that degeneracy is most unlikely to be encountered at a nonoptimal vertex in solving these problems, but a degenerate optimum can be arranged by the simple expedient of allowing > p. Uniqueness of this optimum is ensured if exactly p components

348 K. George, M.R. Osborne~On degeneracy

of the associated u are > 0. Thus termination in the presence of degeneracy can be monitored under controlled conditions.

These problems are solved using a tableau based implementation of the projected gradient algorithm which follows the description given in [11]. A detailed commentary is not feasible here, but the program has been written as an F-Web [10] and the output is available via anonymous ftp as files ACTR 10 06 92.c and ACTR 9 06 92.ps.Z from the directory ftp/pub/acreports on thrain.anu.edu.au (log in as anonymous and use your email address as password). Both the postscript produced from the weave output (a TEX documented listing) and the tangle output (a C program) are available. Here, for comparison with the steepest descent method, the calculation of the descent direction using the projected gradient algorithm is sketched under the generic assumptions

rank (A~) = I~1 P,

where cr points to the constraints active at the current point x. The basic idea is to solve for t the least squares problem

mintlttl2; t = - -c+ATu. (3.1) 1/

If c ¢ range(A T) then I1¢11 > 0, and it follows from the least squares condition

that

T T t A ~ = 0

-Iltll 2 = cT t < O.

This shows that t is a feasible descent direction. Otherwise, i fu > 0 then the K u h n - Tucker conditions hold and x is optimal; else ui < 0 for at least one i. Then cr ~ cr\{cr(i)} and t is recalculated. Heuristics for selecting ui from among the class of possible contenders are considered in [11]. It is straightforward to verify that the new t is a feasible descent direction. The algorithm now steps in the direction t until a new active constraint is encountered. Then cr is updated, and the sequence of calculations begins again unless the new point is optimal. Degeneracy occurs if more than one new constraint becomes active simultaneously.

The first problem associated with the occurrence of degeneracy in linear programming is correctly identifying the active constraints in the presence of numerical error. Once these have been identified, the resolution of the degeneracy related problems is a relatively straightforward procedure. At the current point x the active constraints are those for which

a T x = b j + 6 , (3.2)

K. George, M.R. Osborne~On degeneracy 349

Table 1 Median number ofiterations, nondegenerate case.

p ~ 50 150 450 1350

10 15 20 28 31 20 39 49 66 40 72 95 126

where [ jl must be suitably small in comparison with the working tolerance. The dit~culty lies in choosing this tolerance. Actions that have been taken to make a test based on the size of I jl viable include:

(1) the use of a projected gradient algorithm based on orthogonal factorization of A~ to solve the least squares problem (3.1). Both the stability of the transformation and the invariance of row lengths are seen as advantages in this context;

(2) the scaling of the lengths of the constraint rows, the right hand side b, and the objective function representer c in order to control the magnitudes of numbers occurring; and

(3) the use of a normalised steepest edge test [11] in selecting the constraint to drop in generating the current descent direction. This test compares direc- tional derivatives on the possible descent edges. It requires a knowledge of the diagonal elements of T -1 (A,,A~) , but these can be updated efficiently using a recurrence relation, and they give directly the Frobenius condition number of A~ which can be used in setting tolerances.

Numerical results are reported in tables 1 and 2. In table 1 the results given are the medians of the number of descent steps for each of 11 trials for non- degenerate problems for each of the dimensions indicated. In table 2 the results reported are for problems constructed to have degenerate optima. The reduced problems (2.8) were nondegenerate in all cases.

Here the column headed "nd" gives the number of constraints active at the

Table 2 Median number of iterations, degenerate case.

p 50 150 450 1350

10 20 12(4) 21 (2) 25(3) 26(3) 20 40 33(6) 49(5) 54(4) 40 60 69(13) 94(9) 123(7)

350 K. George, M.R. Osborne~On degeneracy

optimum, and the additional entries in brackets give the median number of steps taken after the degeneracy had been diagnosed. In each case the number of active constraints at the optimum is determined correctly using a tolerance of max(np, cond(A~)macheps). In order to find a first feasible solution in order to start the computation the problem is embedded in the problem

[x:+! ]EX Xp+l 0 Xp+ 1 - - ' (3.3)

which has the feasible point [0, maxi max(b/, 0)] T, and the solution [x, 0] T, where x minimizes (2.1), provided "Y is large enough. Here "7 may be thought of as a penalty parameter chosen to force Xp+l to zero in order to make the objective function small. The new problem may be unbounded below if 3' is too small, even when (2.1) has a well determined solution; and if Xp+l > 0 for 3' arbitrarily large then the original problem has no feasible solution [11].

If the modified problem (3.3) has a well determined (nondegenerate) solution with Xp+l > 0 for all 7 large enough then it follows from the linearity of the objective function that the multiplier vector satisfies

d// = [Ao e~]-Teg+l > O. (3.4)

This provides a test for recognising the infeasible case. This test proved relevant for interpreting some of the numerical experiments. In these problems, constructed to have degenerate optima, the data were perturbed by writing them to the file which passed the problem description to the LP solver using a "%16.8e" format. As perturbations of this kind are quite likely to turn the overspecified active set defin- ing the degenerate optimum into contradictory constraints, it is not surprising that (3.4) was now observed to hold at the optimum with xp+1 > 0 to working tolerance but still very small (O(10 -11) compared to a tolerance of O(10-14)). Here the correct conclusion is that x is a good approximation to the solution of a close by, exactly degenerate problem.

The results reported in table 2 suggest that the degeneracy resolving pro- cedure based on solving the reduced problem (2.8) is effective. One way to assess this is by comparing these results with those for nondegenerate problems reported in table 1, and noting that p appears a more important parameter than n in pre- dicting the number of iterations. There is no evidence to suggest that the random choice of e is unsatisfactory.

4. A steepest descent algorithm for Hnear programming

The method described in this section is due to Dax [3]. For simplicity let rank(A~) = min(p, Iol), so the case rank(A~)< < p is not considered here.

K. George, M.R. Osborne/On degeneracy 351

The idea is to compute the descent direction t at the current point x by solving the constrained least squares problem

minlltll2; t = - c + ATu. (4.1) u > 0

The connection with steepest descent follows by noting that this problem finds the closest point to the origin in the subdifferential of the problem epigraph at x. The formulation should be compared with (3.1). The Kuhn-Tucker conditions for (4.1) give

T T t A , = z T, z T u = o , z ; > 0 , i=1,2, . . . , I0"1. (4.2)

It follows from (4.1), (4.2) that

Iltll 2 = --cT t 'k tT ATIt = --cT t,

so that t is a descent direction. Also, there need be at most p positive ui in the solution to (4.1). If there are exactly p, and if these are associated with linearly independent columns of A T, then t = 0 in (4.1) and x is an optimum for (2.1).

Algorithm RLSD described in [2] is used to solve (4.1). This is an active set descent method which starts by enforcing the complementarity condition in (4.2) by setting

Eo'] [0] , U = , Z = , 0 . = 0.1 !,.J 0"2, (4.3 Z2

where 0.,, [0., [ < p points to the rows of A,~ associated with u,. Then AT = [AT, AT] can be factorized to give

A~=[Q1 Q2][ U1 UI2]B ' (4.4)

where Q = [Q1 Q2] is orthogonal, U1 is upper triangular, but B need not be reduced. Algorithm RLSD ensures that U: is nonsingular. The first part of the current step of the algorithm is to solve

This gives

min [ltH2; t = - c + A T u . (4.5) u2 =O

t = -QzQTc, (4.6)

ul = U{qQTc, (4.7)

z2 = --BTQTc. (4.8)

352 K. George, M.R. Osborne~On degeneracy

It will be possible always to assume ul > 0 or null initially. I fz > 0 then u is a K u h n - Tucker point for (4.1). Otherwise, set y = u and select Zk < 0 (a normalised steepest edge test is given in [2]), and update o.:

o.l *- o.1U{o.(k)}, o.2 <-- o.2\{~r(k)}. (4.9)

It is not guaranteed that ul > 0 for the updated or, but u - y defines a descent direc- tion for minimizing Ilt II 2. Let q be the first component o fy to vanish in this direction, update o.:

oq ~ o.l\{q}, °'2 ~-- o.2 tA {q}, (4.10)

and recompute ul. Repeat this descent step until u > 0. Then test for optimality and continue until a Kuhn-Tucker point is found.

This algorithm is guaranteed to converge. Considered as a quadratic pro- gram, it does not encounter degeneracy (this follows from [6]), and it does not require Io.I-< p. Thus it can be applied in cases where conventional linear programming methods encounter degeneracy. However, it still requires the current active set to be identified correctly. Precautions similar to those enumerated in section 3 have to be taken to try and make this step numerically robust.

This procedure has been implemented as an F-Web and is available also by means of anonymous ftp from thrain.anu.edu.au. The files are ACTR 11 06 92.c and ACTR 12 06 92.ps.Z in the previously indicated ftp/pub/acreports directory. Tables 3 and 4 give the results for the same sequence of test problems.

In table 3 the bracketed numbers give the median number of iterations after lo.l > p has been detected at the final vertex. The steepest descent algorithm performs adequately. The correct optimal active set is found in all but one case with p = 40, and n = 1350. Here the final four vertices have Io.l = 42, 49, 53, 54, so the optimal vertex is split into a cluster of adjacent vertices by rounding error. This did not occur for the same problem with the projected gradient algorithm but more iterations were required in this case (201 against 142). The general pattern of results favours the projected gradient algorithm over the steepest descent algorithm. In fact, the margin is greater than it appears at first sight because each

Table 3 Median number ofiterations, nondegenerate case.

p X 50 150 450 1350

10 17 21 26 34 20 48 60 73 40 90 120 148

K. George, M.R. Osborne~On degeneracy 353

Table 4 Median number of iterations, degenerate case.

p 50 150 450 1350

10 20 2t(5) 22(7) 33(6) 33(7) 20 40 54(8) 62(8) 74(6) 40 60 107(8) 135(10) 173(6)

steepest descent iteration can and frequently does involve several descent steps in computing u > 0. The steepest descent algorithm works explicitly with an active set pointed to by a, and this clearly differentiates it from the interior point methods which arc the subject of much current interest. However, it differs from the projected gradient algorithm in building up to a vertex slowly. This is illustrated in fig. I, which gives a plot of la[ against number of iterations for a randomly generated, nondcgcncratc problem with p = 15, n = 500, and with the imbedding problem (3.3) used to provide an initial feasible point. This plot shows what appears to bc typical behaviour with a vertex being attained only in the last few iterations. Also noted in the numerical experiments was sornc evidence of bern- stitching with several constraints being moved into the active set and then out again several tirncs in the progress of an iteration. Our randomly generated problcrns

2O

18

16

14

12

10

8

6

4

2

0 0

f ; 1'o I; 2'0 2; 3; 3; 50

number of iterations

Fig. 1. Plot of [a I vs. number of iterations for a randomly generated, nondegenerate problem ( p = 15, n = 500).

354 K. George, M.R. Osborne~On degeneracy

would not be expected to treat the opt imum vertex in any special way, so it may be that the steepest descent method (and possibly also interior point methods) would be relatively more successful in cases where the optimal vertex is better conditioned in the sense of being well separated from adjacent vertices.

We are indebted to a referee for pointing out to us that Dax in 1987 also compared the steepest descent algorithm with the projected gradient algorithm in an unpublished technical report of the Hydrological Service of Israel. Apparently his conclusions are similar to our own.

5. Appfication to 11 fitting

In this section it is shown that Wolfe's method for resolving degeneracy extends nicely to the problem of minimizing a polyhedral convex function. This will be described first for the ll estimation problem which provides an adequate example. The extension to more general problems is then sketched briefly. Let

F(x) : ~-~lr~l, r = Ax-b , (5.1) i=1

where A : R p ~ R n is again assumed to have full rank. Then the minimum of F is characterised by an index set cr = {i : r; = 0} pointing to zero residuals such that

- p , rank(A~) = p, and

0 E O F = g + E [ - 1 , 1]ai, (5.2) iEo"

where OF is the subdifferential of F,

g= E sgn(ri)ai' (5.3)

is the gradient of the part of F differentiable at x and the second term in (5.2) gives the set valued contribution from the nondifferentiable terms to the subdifferential. Points satisfying the above condition on cr are the extreme points (or vertices) of the epigraph of F(x), and can be thought of as characterizing the current active set. There are descent algorithms analogous to those for linear programming for minimizing F [11]. These generate a descent direction in which exactly one of the ri, i Etr, moves away from zero at the current vertex. Then a linesearch is made for the minimum of F in this direction. This minimum occurs at a new vertex generated by (at least) one new residual vanishing. To construct a descent direction when = p = rank(A~) let

A~u = -g. (5.4)

K. George,. M.R. Osborne~On degeneracy 355

If ui E [-1,1], i = 1 , 2 , . . . , p then (5.2) holds and x is optimal. Uk ¢ [--1, 1] and consider

Otherwise let

t = O A ~ l e k . (5.5)

The directional derivative [11] of F in the direction t is defined by

= sup vTt. v E OF

F' (x : t) = inf F(x + At) - F(x) ~>o A

Substituting t from (5.5) gives

F ' ( x : t ) = max gTt+Ow, we[-l,1]

= max O[--uk + w], .,e[-1,1]

also

= -Iuk[ + l < O i f 0 = s g n ( u k ) ,

tT au(j) : O~jk ,

so that t is a descent direction with the required properties for this choice of O. This shows that a descent direction is found easily when the generic assumption I~1 = p is satisfied.

Degeneracy occurs when I~1 > P = rank(A~). Wolfe's ideas suggest seeking t directly to satisfy

F'(x : t)=gWt + y~laW~tl < 0. (5.6) iE tr

The connection with the resolution procedure in linear programming is established by noting that such a direction exists if and only if the modified ll estimation problem

(5.7)

is unbounded below. It is also a direction in which the modified problem

356 K. George, M.R. Osborne~On degeneracy

is unbounded, where again E can be chosen essentially arbitrary. Thus the idea is to choose ei by a random selection procedure subject to the constraint that w = 0 is a nondegenerate vertex for (5.8). For example, we could set ei = 0 for the a; inherited from the previous vertex (say i = a(k), k = 1,2,. -- ,p - 1), choose aj, j = a(p), the row to enter, so that A~ is well conditioned (this information is available from the descent step [11]), and set ej = 0 also, and then choose the remaining ek at random. With the modification

g ~ g + E sgn (a[(j)w + Ej)a~(j) j>P

being used in (5.4) in determining u, the basic ll projected gradient algorithm can be applied to the modified problem (5.8) to find an unbounded direction or verify that the current point is optimal. The obvious analogues of both the recursive and non- recursive procedures described in section 2 can be employed if further degeneracy is encountered in computing t.

Extension of this approach to more general problems is perhaps simplest when they are in the standard max form. Let F(x) be given by

F(x) = max{c/rx + di}, (5.9)

where a is a (finite) defining index set, and let a0 point to the active members of the affine set (those for which equality holds in (5.9)). In the Ii case the ci, i E ao, are just the vectors:

fTA, f j = s g n ( r j ) , riCO, f y = + l , rj=O, j = l , 2 , . . - , n .

Now the problem of finding a descent direction at x is the problem of finding t such that

maxcTt < 0. (5.10) iEao

Reinterpreting this in the compact form (5.6) connects back to the above discussion of the l I problem. Degeneracy problems that occur when generic assumptions are violated also can be removed by noting that the problem

min max{c~w + ei} w i E c~ 0

(5.11)

is unbounded below for arbitrary ei if there exists t satisfying (5.10). This follows by evaluating (5.11) at At where t satisfies (5.10) and A > 0 is chosen large. Thus it is only necessary to choose E so that w = 0 is a generic vertex, and to apply to (5.11) whatever solution technique is being used to minimize (5.9) in order to find an unbounded direction. Again, the obvious analogues of both the recursive and

K. George, M.R. Osborne~On degeneracy 357

nonrecursive procedures described in section 2 can be employed if further degeneracy is encountered in computing t.

6. A steepest descent method

The form of 0Fin the Ii estimation problem (5.1) provides a special case of a general form for polyhedral convex functions which is available when the vertices of the epigraph are expressible as solutions of systems of structure equations

~i(x) = O, i E a, [a l >_p. (6.1)

The Ii case corresponds to ~i(x) = ri, i E a. Now it is possible [11] to write the subdifferential as

OF = g + Vw, (6.2)

where essentially g is the gradient of the differentiable part of F, the columns of V are given by

i(v) T = xT~b~(i), i = 1,2,...,1~1, (6.3)

and w is constrained by a condition of the form

w E W, (6.4)

where W is the polyhedral convex set defined by the linear inequalities

tTg+tTvw<_ F ' ( x : t ) ,

where t runs over the edges of epi F meeting at x and F ' (x : t) is the directional derivative of F at x in the direction t. Then a steepest descent variant of the pro- jected gradient algorithm is available in which t is computed by solving the problem

pelltll2; t = - g - vw. (6.5)

In the 11 estimation problem this is a procedure suggested by Dax [4]. It is the direct extension of his steepest descent algorithm for the linear programming problem. It follows directly from the definition (6.2) that - t E OF(x) so that if t is zero then

0 E OF

and x is an optimum. Otherwise the separating hyperplane lemma gives

d t < O, Vv E OF(x), (6.6)

358 K. George, M.R. Osborne~On degeneracy

showing tha t

F'(x : t) = max vTt < 0 (6.7) v E OF(x)

and verifying tha t t is a descent direction. This shows tha t the steepest descent a lgor i thm generalises to the p rob lem of minimizing polyhedra l convex funct ions in a na tura l way.

Remark 6.1

Provided V has full rank, it follows from [6] that degeneracy cannot occur in the solution of the quadratic program (6.5) provided the matrices of active constraints implied by the constraints w E W have their full row rank. When this condition is satisfied, then the steepest descent algorithm generates a descent direction independent of generic assumptions on the structure of the generalised active set (6.1). It is obvious that this condition is satisfied for linear program- ming, and it is also satisfied for the general class of problems of minimizing convex, piecewise linear functions (Fourer [7], Rockafellar [13]). Here the constraints are interval constraints, and the particular case of 1~ estimation gives - 1 _< u; < 1, i E cr, for example. Linear independence follows from the observation that both bounds cannot hold simultaneously. It remains true for the more general constraint conditions encountered in the data analysis procedure rank regression [111.

References

[I] G.W. Bassett and R.W. Koenker, An empirical quantile function for linear models with iid errors, J. Am. Stat. Assoc. 77 (1982) 407-415.

[2] D.I. Clark and M.R. Osborne, On linear restricted and interval least squares problems, IMA J. Numer. Anal. 8 (1988) 23-36.

[3] A. Dax, Linear programming via least squares, Lin. Alg. Appl. 111 (1988) 313-324. [4] A. Dax, The Ii solution of linear equations subject to linear constraints, SISSC 10 (1989)

328-340. [5] R. Fletcher, Degeneracy in the presence of roundofferrors, Lin. Alg. Appl. 106 (1988) t49-183. [6] R. Fletcher, Resolving degeneracy in quadratic programming, Report NA/t35, Mathematics

Department, University of Dundee (1991). [7] R. Fourer, A simplex algorithm for piecewise linear programming, Math. Progr. 33 (1985)

204-233. [8] P.E. Gill, W. Murray, M.A. Saunders and M.H. Wright, A practical anti-cycling procedure for

linearly constrained optimization, Math. Progr. 45 (1989) 437--474. [9] M.J. Hopper and M.J.D. Powell, A technique that gains speed and accuracy in the minimax

solution of over-determined linear equations, in: Mathematical Software III, ed. J.R. Rice (Academic Press, 1977) pp. 15-34.

[10] J.A. Krommes, The WEB system of structured software design and documentation for Fortran, Ratfor, and C, Technical Report, Princeton University; available by anonymous ftp from ss01.pppl.gov in directory/pub/fweb (1989).

K. George, M.R. Osborne/On degeneracy 359

[11] M.R. Osborne, Finite Algorithms in Optimization and Data Analysis (Wiley, 1985). [12] M.R. Osborne, The reduced gradient algorithm, in: Statistical Data Analysis, ed. Y. Dodge

(North-Holland, 1987) pp. 95-108. [13] R.T. Rockafellar, Network Flows and Monodromic Optimization (Wiley, 1984). [14] D.M. Ryan and M.R. Osborne, On the solution of highly degenerate linear programmes, Math.

Progr. 41 (1988) 385-392. [15] P. Wolfe, A technique for resolving degeneracy in linear programming, SIAM J. 11 (1963)

205-211.