SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR · Abstract The optimality conditions of a nonlinear...

SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR

SECOND-ORDER CONE PROGRAMS1

Christian Kanzow2 and Masao Fukushima3

April 19, 2006

Abstract

The optimality conditions of a nonlinear second-order cone program can be refor-mulated as a nonsmooth system of equations using a projection mapping. This al-lows the application of nonsmooth Newton methods for the solution of the nonlinearsecond-order cone program. Conditions for the local quadratic convergence of thesenonsmooth Newton methods are investigated. Related conditions are also given forthe special case of a linear second-order cone program. An interesting and importantfeature of these conditions is that they do not require strict complementarity of thesolution.

Key Words: Linear second-order cone program, nonlinear second-order cone program,

semismooth function, nonsmooth Newton method, quadratic convergence

1This work was supported in part by the Scientific Research Grant-in-Aid from Japan Society for thePromotion of Science.

2Institute of Mathematics, University of Wurzburg, Am Hubland, 97074 Wurzburg, Germany, e-mail:[email protected]

3Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University,Kyoto 606-8501, Japan, e-mail: [email protected]

1 Introduction

We consider both the linear second-order cone program

min cT x s.t. Ax = b, x ∈ K,

and the nonlinear second-order cone program

min f(x) s.t. Ax = b, x ∈ K,

where f : Rn → R is a twice continuously differentiable function, A ∈ Rm×n is a given

matrix, b ∈ Rm and c ∈ Rn are given vectors, and

K = K1 × · · · × Kr

is a Cartesian product of second-order cones Ki ⊆ Rni , n1 + · · · + nr = n. Recall that the

second-order cone (or ice-cream cone or Lorentz cone) of dimension ni is defined by

Ki :={zi = (zi0, zi) ∈ R× Rni−1

∣∣ zi0 ≥ ‖zi‖},

where ‖ · ‖ denotes the Euclidean norm. Observe the special notation that is used in the

definition of Ki and that will be applied throughout this manuscript: For a given vector

z ∈ R` for some ` ≥ 1, we write z = (z0, z), where z0 is the first component of the vector

z, and z consists of the remaining `− 1 components of z.

The linear second-order cone program has been investigated in many previous works,

and we refer the interested reader to the two survey papers [16, 1] and the books [2, 4]

for many important applications and theoretical properties. Software for the solution of

linear second-order cone programs is also available, see, for example, [15, 24, 20, 23]. In

many cases, the linear second-order cone program is viewed as a special case of a (linear)

semidefinite program (see [1] for a suitable reformulation). However, we feel that the

second-order cone program should be treated directly since the reformulation of a second-

order cone constraint as a semidefinite constraint increases the dimension of the problem

significantly and, therefore, decreases the efficiency of any solver. In fact, many solvers for

semidefinite programs (see, for example, the list given on Helmberg’s homepage [13]) are

able to deal with second-order cone constraints separately.

The treatment of the nonlinear second-order cone program is much more recent, and,

in the moment, the number of publications is rather limited, see [3, 5, 6, 7, 8, 11, 12, 14,

21, 22, 25]. These papers deal with different topics; some of them investigate different

2

kinds of solution methods (interior-point methods, smoothing methods, SQP-type meth-

ods, or methods based on unconstrained optimization), while some of them consider certain

theoretical properties or suitable reformulations of the second-order cone program.

The method of choice for the solution of (at least) the linear second-order cone program

is currently an interior-point method. However, some recent preliminary tests indicate that

the class of smoothing or semismooth methods is sometimes superior to the class of interior-

point methods, especially for nonlinear problems, see [8, 12, 22]. On the other hand, the

theoretical properties of interior-point methods are much better understood than those of

the smoothing and semismooth methods.

The aim of this paper is to provide some results which help to understand the the-

oretical properties of semismooth methods being applied to both linear and nonlinear

second-order cone programs. The investigation here is of local nature, and we provide

sufficient conditions for those methods to be locally quadratically convergent. An interest-

ing and important feature of those sufficient conditions is that they do not require strict

complementarity of the solution. This is an advantage compared to interior-point methods

where singular Jacobians occur if strict complementarity is not satisfied.

The paper is organized as follows: Section 2 states a number of preliminary results

for a projection mapping onto a second-order cone, which will later be used in order to

reformulate the optimality conditions of the second-order cone program as a system of

equations. Section 3 then investigates the nonlinear second-order cone program, whereas

its linear counterpart is discussed in Section 4. We close with some final remarks in

Section 5.

Most of our notation is standard. For a differentiable mapping G : Rn → Rm, we denote

by G′(z) ∈ Rm×n the Jacobian of G at z. If G is locally Lipschitz continuous, the set

∂BG(z) :={H ∈ Rm×n

∣∣∃{zk} ⊆ DG : zk → z, G′(zk) → H}

is nonempty and called the B-subdifferential of G at z, where DG ⊆ Rn denotes the

set of points at which G is differentiable. The convex hull ∂G(z) := conv∂BG(z) is the

generalized Jacobian of Clarke [9]. We assume that the reader is familiar with the concepts

of (strongly) semismooth functions, and refer to [19, 18, 17, 10] for details.

3

2 Projection Mapping onto Second-Order Cone

Throughout this section, let K be the single second-order cone

K :={z = (z0, z) ∈ R× Rn

∣∣ z0 ≥ ‖z‖}.

In the subsequent sections, K will be the Cartesian product of second-order cones. The

results of this section will later be applied componentwise to each of the second-order cones

Ki in the Cartesian product.

Recall that the second-order cone K is self-dual, i.e. K∗ = K, where K∗ := {d ∈R × Rn | zT d ≥ 0 ∀z ∈ K}

denotes the dual cone of K, cf. [1, Lemma 1]. Hence the

following result holds, see, e.g., [11, Proposition 4.1].

Lemma 2.1 The following equivalence holds:

x ∈ K, y ∈ K, xT y = 0 ⇐⇒ x− PK(x− y) = 0,

where PK(z) denotes the (Euclidean) projection of a vector z on K.

An explicit representation of the projection PK(z) is given in the following result, see [11,

Proposition 3.3].

Lemma 2.2 For any given z = (z0, z) ∈ R× Rn, we have

PK(z) = max{0, λ1}u(1) + max{0, λ2}u(2),

where λ1, λ2 are the spectral values and u(1), u(2) are the spectral vectors of z, respectively,

given by

λ1 := z0 − ‖z‖, λ2 := z0 + ‖z‖,

u(1) :=

1

2

(1

− z‖z‖

)if z 6= 0,

1

2

(1

−w

)if z = 0,

u(2) :=

1

2

(1z‖z‖

)if z 6= 0,

1

2

(1

w

)if z = 0,

where w is any vector in Rn with ‖w‖ = 1.

It is well-known that the projection mapping onto an arbitrary closed convex set is non-

expansive and hence is Lipschitz continuous. When the set is the second-order cone K, a

stronger smoothness property can be shown, see [12, Proposition 4.5].

4

Lemma 2.3 The projection mapping PK is strongly semismooth.

We next characterize the points at which the projection mapping PK is differentiable.

Lemma 2.4 The projection mapping PK is differentiable at a point z = (z0, z) ∈ R×Rn if

and only if z0 6= ±‖z‖ holds. In fact, the projection mapping is continuously differentiable

at every z such that z0 6= ±‖z‖.

Proof. The statement can be derived directly from the representation of PK(z) given in

Lemma 2.2. Alternatively, it can be derived as a special case of more general results stated

in [7], see, in particular, Propositions 4 and 5 in that reference. ¤

We next calculate the Jacobian of the projection mapping PK at a point where it is differ-

entiable.

Lemma 2.5 The Jacobian of PK at a point z = (z0, z) ∈ R×Rn with z0 6= ±‖z‖ is given

by

P ′K(z) =

0, if z0 < −‖z‖,In+1, if z0 > +‖z‖,1

2

(1 wT

w H

), if − ‖z‖ < z0 < +‖z‖,

where

w :=z

‖z‖ , H :=( z0

‖z‖ + 1)In − z0

‖z‖wwT .

(Note that the denominator is automatically nonzero in this case.)

Proof. First assume that z 6= 0 holds.

Consider the case z0 < −‖z‖. This implies not only z0 + ‖z‖ < 0 but also z0−‖z‖ < 0.

Therefore, in view of Lemma 2.2, it follows that PK is the zero mapping in a neighborhood

of the given point z. Hence the Jacobian at z is the zero matrix.

Next consider the case z0 > +‖z‖. Then z belongs to the interior of the cone K and,

therefore, PK is the identity mapping in a small neighborhood of z (this can also be verified

by direct calculation using Lemma 2.2 once again). Hence the Jacobian at z is the identity

matrix.

5

Now let z0 ∈ (−‖z‖, +‖z‖). Then z0 + ‖z‖ > 0 and z0 − ‖z‖ < 0. Consequently, using

Lemma 2.2 again, we have (locally)

PK(z) =1

2

(z0 + ‖z‖)

(1z‖z‖

).

Let us write

θ(z) :=z

‖z‖ .

Since z 6= 0, it is easy to see that θ is differentiable at z with Jacobian

θ′(z) =1

‖z‖2

(‖z‖In − 1

‖z‖ zzT).

Hence we obtain

P ′K(z) =

1

2

(z0 + ‖z‖)

(0 0

0 θ′(z)

)+

1

2

(1z‖z‖

) (1,

zT

‖z‖)

=1

2

(z0 + ‖z‖)

(0 0

0 1‖z‖2

(‖z‖In − 1‖z‖ zzT

))

+1

2

(1 zT

‖z‖z‖z‖

zzT

‖z‖2

)

=1

2

(1 zT

‖z‖z‖z‖

(z0

‖z‖ + 1)In − z0

‖z‖zzT

‖z‖2

),

which is precisely the statement in this case.

Finally, let us consider the case where z = 0. Then the statement is a special case of

[11, Proposition 5.2]. However, since PK is continuously differentiable at (z0, 0) for any

z0 6= 0 by Lemma 2.4, the Jacobian at (z0, z) can also be derived by taking a limit of a

sequence {zk} = {(zk0 , z

k)} converging to z = (z0, 0) and satisfying zk 6= 0. ¤

Based on the previous results, we are now in a position to give an expression for the ele-

ments of the B-subdifferential ∂BPK(z) at an arbitary point z. A similar representation of

the elements of the Clarke generalized Jacobian ∂PK(z) is given in [12, Proposition 4.8],

without discussion on conditions for the nonsingularity of those matrices. Here we prefer

to deal with the smaller set ∂BPK(z) since this will simplify our subsequent analysis to give

sufficient conditions for the nonsingularity of all elements in ∂BPK(z). In fact, the non-

singularity of all elements of the B-subdifferential usually hold under weaker assumptions

than the nonsingularity of all elements of the corresponding Clarke generalized Jacobian.

Lemma 2.6 Given a general point z = (z0, z) ∈ R × Rn, each element V ∈ ∂BPK(z) has

the following representation:

6

(a) If z0 6= ±‖z‖, then PK is continuously differentiable at z and V = P ′K(z) with the

Jacobian P ′K(z) given in Lemma 2.5.

(b) If z 6= 0 and z0 = +‖z‖, then

V ∈{

In+1,1

2

(1 wT

w H

)},

where w := z‖z‖ and H := 2In − wwT .

(c) If z 6= 0 and z0 = −‖z‖, then

V ∈{

0,1

2

(1 wT

w H

)},

where w := z‖z‖ and H := wwT .

(d) If z = 0 and z0 = 0, then either V = 0 or V = In+1 or V belongs to the set{

1

2

(1 wT

w H

) ∣∣∣ H = (w0 + 1)In − w0wwT for some |w0| ≤ 1 and ‖w‖ = 1

}.

Proof. Throughout the proof, we denote by D the set of points where the mapping PKis differentiable. Recall that this set is characterized in Lemma 2.4.

(a) Under the stated assumptions on (z0, z), it follows from Lemma 2.4 that PK is continu-

ously differentiable at z. Hence the B-subdifferential consists of one element only, namely

the Jacobian P ′K(z). This observation gives the first statement.

(b) Let z 6= 0 and z0 = +‖z‖. Furthermore, let {zk} ⊆ D be an arbitrary sequence

converging to z. Writing zk = (zk0 , z

k), we can assume without loss of generality that, for

each k ∈ N, we have either −‖zk‖ < zk0 < ‖zk‖ or zk

0 > ‖zk‖. If zk0 > ‖zk‖ for all k ∈ N,

we have P ′K(zk) = In+1 by Lemma 2.5 and, therefore, P ′

K(zk) → In+1 as k → ∞. On the

other hand, if −‖zk‖ < zk0 < ‖zk‖ for all k ∈ N, Lemma 2.5 shows that

P ′K(zk) =

1

2

(1 (wk)T

wk Hk

)

for all k ∈ N, where

wk :=zk

‖zk‖ and Hk :=( zk

0

‖zk‖ + 1)In − zk

0

‖zk‖wk(wk)T . (1)

7

Taking the limit k →∞, we therefore obtain

P ′K(zk) → 1

2

(1 wT

w H

)with w :=

z

‖z‖ , H := 2In − wwT .

Finally, if the sequence {zk} is such that zk0 > ‖zk‖ for some k ∈ N and −‖zk‖ < zk

0 < ‖zk‖for the remaining k ∈ N, we do not get any other limiting elements for P ′

K(zk).

(c) Let z 6= 0 and z0 = −‖z‖. Moreover, let {zk} ⊆ D be any sequence converging to z.

Writing zk = (zk0 , z

k) for each k ∈ N, we may assume without loss of generality that, for

each k ∈ N, we have either zk0 < −‖zk‖ or −‖zk‖ < zk

0 < ‖zk‖. By the same reason as in

the proof of part (b), it suffices to consider the two cases: (i) zk0 < −‖zk‖ for all k ∈ N,

and (ii) −‖zk‖ < zk0 < ‖zk‖ for all k ∈ N. In the former case, it follows from Lemma 2.5

that P ′K(zk) = 0 for all k ∈ N, hence P ′

K(zk) → 0 as k → ∞. In the latter case, it also

follows from Lemma 2.5 that

P ′K(zk) =

1

2

(1 (wk)T

wk Hk

)

with wk, Hk given by (1). Taking the limit k →∞ gives

P ′K(zk) → 1

2

(1 wT

w H

)with w :=

z

‖z‖ , H := wwT .

(d) Let z = 0 and z0 = 0. Let {zk} = {(zk0 , z

k)} ⊆ D be any sequence converging to

z = (z0, z) = (0, 0). Then there are three possibilities: (i) zk0 < −‖zk‖, in which case we

have P ′K(zk) = 0 by Lemma 2.5, (ii) zk

0 > ‖zk‖, in which case we have P ′K(zk) = In+1 by

Lemma 2.5, and (iii) −‖zk‖ < zk0 < ‖zk‖, in which case we have

P ′K(zk) =

1

2

(1 (wk)T

wk Hk

)

with wk, Hk given by (1), again by Lemma 2.5. Taking the limit (possibly on a subsequence)

in each of these cases, we have either P ′K(zk) → 0 or P ′

K(zk) → In+1 or

P ′K(zk) → 1

2

(1 wT

w H

)with H := (w0 + 1)In − w0wwT

for some vector w = (w0, w) satisfying |w0| ≤ 1 and ‖w‖ = 1. This completes the proof. ¤

8

We can summarize Lemma 2.6 as follows: Any element V ∈ ∂BPK(z) is equal to

V = 0 or V = In+1 or V =1

2

(1 wT

w H

)(2)

for some vector w ∈ Rn with ‖w‖ = 1 and some matrix H ∈ Rn×n of the form H =

(1 + α)In − αwwT with some scalar α ∈ R satisfying |α| ≤ 1. Specifically, in statements

(a)–(c) we have w = z/‖z‖, whereas in statement (d), w can be any vector of length one.

Moreover, we have α = z0/‖z‖ in statement (a), α = 1 in statement (b), α = −1 in

statement (c), whereas there is no further specification of α in statement (d) (here the two

simple cases V = 0 and V = In+1 are always excluded).

The eigenvalues of any matrix V ∈ ∂BPK(z) can be given explicitly, as shown in the

following result.

Lemma 2.7 Let z = (z0, z) ∈ R × Rn and V ∈ ∂BPK(z). Assume that V 6∈ {0, In+1} so

that V has the third representation in (2) with H = (1 + α)In − αwwT for some scalar

α ∈ [−1, +1] and some vector w ∈ Rn satisfying ‖w‖ = 1. Then V has the two single

eigenvalues λ = 0 and λ = 1 as well as the eigenvalue λ = 12(α + 1) with multiplicity

n − 1 (unless α = ±1, where the multiplicities change in an obvious way). Moreover, the

corresponding eigenvectors of V are given by

(−1

w

),

(1

w

), and

(0

vj

), j = 1, . . . , n− 1,

where v1, . . . , vn−1 are arbitrary vectors that span the linear subspace {v ∈ Rn | vT w = 0}.

Proof. By assumption, we have

V =1

2

(1 wT

w H

)with H = (1 + α)In − αwwT

for some α ∈ [−1, +1] and some vector w satisfying ‖w‖ = 1. Now take an arbitrary vector

v ∈ Rn orthogonal to w, and let u = (0, vT )T . Then an elementary calculation shows that

V u = λu holds for λ = 12(α+1). Hence this λ is an eigenvalue of V with multiplicity n− 1

since we can choose n− 1 linearly independent vectors v ∈ Rn such that vT w = 0. On the

other hand, if λ = 0, it is easy to see that V u = λu holds with u = (−1, wT )T , whereas for

λ = 1 we have V u = λu by taking u = (1, wT )T . This completes the proof. ¤

9

Note that Lemma 2.7 particularly implies λ ∈ [0, 1] for all eigenvalues λ of V . This

fact can alternatively be derived from the fact that PK is a projection mapping, without

referring to the explicit representation of V as given in Lemma 2.6.

We close this section by pointing out an interesting relation between the matrix V ∈∂BPK(z) and the so-called arrow matrix

Arw(z) :=

(z0 zT

z z0In

)∈ R(n+1)×(n+1)

associated with z = (z0, z) ∈ R × Rn, which frequently occurs in the context of interior-

point methods and in the analysis of second-order cone problems, see, e.g., [1]. To this

end, consider the case where PK is differentiable at z, excluding the two trivial cases where

P ′K(z) = 0 and P ′

K(z) = In+1, cf. Lemma 2.5. Then by Lemma 2.7, the eigenvectors of the

matrix V = P ′K(z) are given by

(−1

z‖z‖

),

(1z‖z‖

), and

(0

vj

), j = 1, . . . , n− 1, (3)

where v1, . . . , vn−1 comprise an orthogonal basis of the linear subspace {v ∈ Rn | vT z = 0}.However, an elementary calculation shows that these are also the eigenvectors of the arrow

matrix Arw(z), with corresponding single eigenvalues λ1 = z0 − ‖z‖, λ2 = z0 + ‖z‖ and

the multiple eigenvalues λi = z0, i = 3, . . . , n + 1. Therefore, although the eigenvalues of

V = P ′K(z) and Arw(z) are different, both matrices have the same set of eigenvectors.

3 Nonlinear Second-Order Cone Programs

In this section, we consider the nonlinear second-order cone program (nonlinear SOCP for

short)

min f(x) s.t. Ax = b, x ∈ K,

where f : Rn → R is a twice continuously differentiable function, A ∈ Rm×n is a given

matrix, b ∈ Rm is a given vector, and K = K1 × · · · × Kr is the Cartesian product of

second-order cones Ki ⊆ Rni with n1 + · · ·+ nr = n. Here we are particularly interested in

the case where the objective function f is nonlinear. The special case where f is a linear

function will be discussed in more detail in the subsequent section.

10

Under certain conditions like convexity of f and a Slater-type constraint qualification

[4], solving the nonlinear SOCP is equivalent to solving the corresponding KKT conditions,

which can be written as follows:

∇f(x)− AT µ− λ = 0,

Ax = b,

xi ∈ Ki, λi ∈ Ki, xTi λi = 0 i = 1, . . . , r.

Using Lemma 2.1, it follows that these KKT conditions are equivalent to the system of

equations M(z) = 0, where M : Rn × Rm × Rn → Rn × Rm × Rn is defined by

M(z) := M(x, µ, λ) :=

∇f(x)− AT µ− λ

Ax− b

x1 − PK1(x1 − λ1)...

xr − PKr(xr − λr)

. (4)

Then we can apply the nonsmooth Newton method [18, 19, 17]

zk+1 := zk −W−1k M(zk), Wk ∈ ∂BM(zk), k = 0, 1, 2, . . . , (5)

to the system of equations M(z) = 0 in order to solve the nonlinear SOCP or, at least, the

corresponding KKT conditions. Our aim is to show fast local convergence of this iterative

method. In view of the results in [19, 18], we have to guarantee that, on one hand, the

mapping M , though not differentiable everywhere, is still sufficiently ‘smooth’, and, on the

other hand, it satisfies a local nonsingularity condition under suitable assumptions.

The required smoothness property of M is summarized in the following result.

Theorem 3.1 The mapping M defined by (4) is semismooth. Moreover, if the Hessian

∇2f is locally Lipschitz continuous, then the mapping M is strongly semismooth.

Proof. Note that a continuously differentiable mapping is semismooth. Moreover, if the

Jacobian of a differentiable mapping is locally Lipschitz continuous, then this mapping is

strongly semismooth. Now Lemma 2.3 and the fact that a given mapping is (strongly)

semismooth if and only if all component functions are (strongly) semismooth yield the

desired result. ¤

11

Our next step is to provide suitable conditions which guarantee the nonsingularity of all

elements of the B-subdifferential of M at a KKT point. This requires some more work,

and we begin with the following general result.

Proposition 3.2 Let H ∈ Rn×n be symmetric, and A ∈ Rm×n. Let V a, V b ∈ Rn×n be two

symmetric positive semidefinite matrices such that their sum V a + V b is positive definite

and V a and V b have a common basis of eigenvectors, so that there exist an orthogonal

matrix Q ∈ Rn×n and diagonal matrices Da = diag(a1, . . . , an

)and Db = diag

(b1, . . . , bn

)

satisfying V a = QDaQT , V b = QDbQT as well as aj ≥ 0, bj ≥ 0 and aj + bj > 0 for all

j = 1, . . . , n. Let the index set {1, . . . , n} be partitioned as {1, . . . , n} = α ∪ β ∪ γ, where

α := {j | aj > 0, bj = 0},β := {j | aj > 0, bj > 0},γ := {j | aj = 0, bj > 0},

and let Qα, Qβ, and Qγ denote the submatrices of Q consisting of the columns from Q

corresponding to the index sets α, β, and γ, respectively. Assume that the following two

conditions hold:

(a) The matrix H is positive semidefinite on the subspace Sα := {d ∈ Rn |Ad = 0,

QTαd = 0}, and positive definite on the subspace Sα,β := {d ∈ Rn |Ad = 0, QT

αd = 0,

QTβ d = 0} = Sα ∩ {d ∈ Rn |QT

β d = 0}.

(b) The matrix(AQβ, AQγ

)has full row rank.

Then the matrix

W :=

H −AT −I

A 0 0

V a 0 V b

is nonsingular.

Proof. Let y =(y(1), y(2), y(3)

) ∈ Rn × Rm × Rn be any vector such that Wy = 0. Then

Hy(1) − AT y(2) − y(3) = 0, (6)

Ay(1) = 0, (7)

V ay(1) + V by(3) = 0. (8)

Using the spectral decompositions of V a and V b, equation (8) can be rewritten as

Day(1) + Dby(3) = 0 with y(1) := QT y(1), y(3) := QT y(3).

12

In view of the definitions of the index sets α, β, and γ, this implies

y(1)α = 0, y(3)

γ = 0, and Daβ y

(1)β + Db

β y(3)β = 0. (9)

This shows that

(y(1))T y(3) = (y(1))T QQT y(3) = (y(1))T y(3) = (y(1)β )T y

(3)β = −(y

(1)β )T

(Db

β

)−1Da

β y(1)β .

Therefore, premultiplying (6) by (y(1))T , we obtain from (7) that

(y(1))T Hy(1) = (y(1))T y(3) = −(y(1)β )T

(Db

β

)−1Da

β y(1)β ≤ 0, (10)

and the inequality is strict whenever y(1)β 6= 0. However, in view of (7) and 0 = y

(1)α =

QTαy(1), cf. (9), we obtain from the positive semidefiniteness of H on the subspace Sα that

y(1)β = 0, i.e., QT

β y(1) = 0. This in turn gives y(3)β = 0 by (9). Using (10) and the fact that

y(1) ∈ Sα,β, we then obtain y(1) = 0 from the assumed positive definiteness of H on the

subspace Sα,β. Now premultiplying (6) by QT and writing down the corresponding block

equations for the β and γ blocks, we obtain from y(1) = 0, y(3)β = 0, and y

(3)γ = 0 that

QTβ AT y(2) = 0 and QT

γ AT y(2) = 0.

Hence y(2) = 0 follows from Assumption (b). Finally, by (6), we have y(3) = 0. ¤

In order to apply Proposition 3.2 to the (generalized) Jacobian of the mapping M at a

KKT point, we first introduce some more notation:

intKi :={xi

∣∣xi0 > ‖xi‖}

denotes the interior of Ki,

bdKi :={xi

∣∣xi0 = ‖xi‖}

denotes the boundary of Ki, and

bd+Ki := bdKi \ {0} is the boundary of Ki excluding the origin.

We also call a KKT point z∗ = (x∗, µ∗, λ∗) of the nonlinear SOCP strictly complementary

if x∗i + λ∗i ∈ intKi holds for all block components i = 1, . . . , r. This notation enables us to

restate the following result from [1].

Lemma 3.3 Let z∗ = (x∗, µ∗, λ∗) be a KKT point of the nonlinear SOCP. Then precisely

one of the following six cases holds for each block pair (x∗i , λ∗i ), i = 1, . . . , r:

13

x∗i λ∗i SC

x∗i ∈ intKi λ∗i = 0 yes

x∗i = 0 λ∗i ∈ intKi yes

x∗i ∈ bd+Ki λ∗i ∈ bd+Ki yes

x∗i ∈ bd+Ki λ∗i = 0 no

x∗i = 0 λ∗i ∈ bd+Ki no

x∗i = 0 λ∗i = 0 no

The last column in the table indicates whether or not strict complementarity (SC) holds.

We also need the following simple result which, in particular, shows that the projection

mapping PKiinvolved in the definition of the mapping M is continuously differentiable at

si := x∗i − λ∗i for any block component i satisfying strict complementarity.

Lemma 3.4 Let z∗ = (x∗, µ∗, λ∗) be a KKT point of the nonlinear SOCP. Then the fol-

lowing statements hold for each block pair (x∗i , λ∗i ):

(a) If x∗i ∈ intKi and λ∗i = 0, then PKiis continuously differentiable at si := x∗i −λ∗i with

P ′Ki

(si) = I.

(b) If x∗i = 0 and λ∗i ∈ intKi, then PKiis continuously differentiable at si := x∗i −λ∗i with

P ′Ki

(si) = 0.

(c) If x∗i ∈ bd+Ki and λ∗i ∈ bd+Ki, then PKiis continuously differentiable at si := x∗i−λ∗i .

Proof. (a) We have si = x∗i − λ∗i = x∗i ∈ intKi. Hence, locally around si, the projection

mapping PKiis the identity mapping, so that its Jacobian equals the identity matrix.

(b) We have si = x∗i − λ∗i = −λ∗i with λ∗i ∈ intKi. Hence, locally around si, the projection

PKimaps everything onto the zero vector. This implies P ′

Ki(si) = 0.

(c) Write x∗i = (x∗i0, x∗i ), λ

∗i = (λ∗i0, λ

∗i ). By assumption, we have x∗i0 = ‖x∗i ‖ 6= 0 and λ∗i0 =

‖λ∗i ‖ 6= 0. Let si := x∗i−λ∗i and write si = (si0, si). In view of Lemma 2.4, it suffices to show

that si0 6= ±‖si‖. First suppose that si0 = ‖si‖. Then x∗i0−λ∗i0 = ‖x∗i −λ∗i ‖. However, since

x∗i 6= 0 and λ∗i 6= 0, it follows from [1, Lemma 15] that there is a constant α > 0 such that

λ∗i0 = αx∗i0 and λ∗i = −αx∗i . Consequently, we obtain (1−α)x∗i0 = (1+α)‖x∗i ‖ = (1+α)x∗i0,

which implies α = 0, a contradiction to α > 0. In a similar way, one gets a contradiction

14

for the case si0 = −‖si‖. ¤

We are now almost in a position to apply Proposition 3.2 to the Jacobian of the mapping

M at a KKT point z∗ = (x∗, µ∗, λ∗) provided that this KKT point satisfies strict com-

plementarity. This strict complementarity assumption will be removed later, but for the

moment it is quite convenient to assume this condition. For example, it then follows from

Lemma 3.3 that the three index sets

JI :={i∣∣ x∗i ∈ intKi, λ

∗i = 0

},

JB :={i∣∣ x∗i ∈ bd+Ki, λ

∗i ∈ bd+Ki

}, (11)

J0 :={i∣∣ x∗i = 0, λ∗i ∈ intKi

}

form a partition of the block indices i = 1, . . . , r. Here, the subscripts I, B and 0 indicate

whether the block component x∗i belongs to the interior of the cone Ki, or x∗i belongs to

the boundary of Ki (excluding the zero vector), or x∗i is the zero vector.

Let Vi := P ′Ki

(x∗i − λ∗i ). Then Lemma 3.4 implies that

Vi = I ∀i ∈ JI and Vi = 0 ∀i ∈ J0. (12)

To get a similar representation for indices i ∈ JB, we need the spectral decompositions

Vi = QiDiQTi of the matrices Vi. Since strict complementarity holds, it follows from Lemma

2.7 that each Vi has precisely one eigenvalue equal to zero and precisely one eigenvalue equal

to one, whereas all other eigenvalues are strictly between zero and one. Without loss of

generality, we can assume that the eigenvalues of Vi are ordered in such a way that

Di = diag(0, ∗, . . . , ∗, 1) ∀i ∈ JB, (13)

where ∗ denotes the remaining eigenvalues that lie in the open interval (0, 1). Correspond-

ingly we also partition the orthogonal matrices Qi as

Qi =(qi, Qi, q′i

) ∀i ∈ JB, (14)

where qi ∈ Rni denotes the first column of Qi, q′i ∈ Rni is the last column of Qi, and

Qi ∈ Rni×(ni−2) contains the remaining ni − 2 middle columns of Qi. We also use the

following partitionings of the matrices Qi:

Qi =(qi, Qi

)=

(Qi, q

′i

) ∀i ∈ JB, (15)

15

where, again, qi ∈ Rni and q′i ∈ Rni are the first and the last columns of Qi, respectively,

and Qi ∈ Rni×(ni−1) and Qi ∈ Rni×(ni−1) contain the remaining ni − 1 columns of Qi. It is

worth noticing that, by (3), the vectors qi and q′i are actually given by

qi =1

2

(−1

x∗i−λ∗i‖x∗i−λ∗i ‖

)and q′i =

1

2

(1

x∗i−λ∗i‖x∗i−λ∗i ‖

).

(Note that we have x∗i − λ∗i 6= 0 since x∗Ti λ∗i = 0, x∗i ∈ bd+Ki, λ∗i ∈ bd+Ki [1, Lemma 15].)

We are now able to prove the following nonsingularity result under the assumption that

the given KKT point satisfies strict complementarity.

Theorem 3.5 Let z∗ = (x∗, µ∗, λ∗) be a strictly complementary KKT point of the nonlin-

ear SOCP and let the (block) index sets JI , JB, J0 be defined by (11). Assume the following

conditions:

(a) The Hessian ∇2f(x∗) is positive semidefinite on the subspace S :={d ∈ Rn

∣∣Ad = 0,

di = 0 ∀i ∈ J0, qTi di = 0 ∀i ∈ JB

}, and positive definite on the subspace S :=

{d ∈

Rn∣∣Ad = 0, di = 0 ∀i ∈ J0, QT

i di = 0 ∀i ∈ JB

}= S ∩ {

d ∈ Rn∣∣ QT

i di = 0 ∀i ∈ JB

},

where a vector d ∈ Rn is partitioned into d = (d1, . . . , dr)T with block components

di ∈ Rni.

(b) The matrix(Ai(i ∈ JI), AiQi(i ∈ JB)

)has linearly independent rows, where Ai is

the submatrix of A consisting of those columns which correspond to the block index i.

Then the Jacobian M ′(z∗) exists and is nonsingular.

Proof. The existence of the Jacobian M ′(z∗) follows immediately from the assumed strict

complementarity of the given KKT point together with Lemma 3.4. A simple calculation

shows that

M ′(z∗) =

∇2f(x∗) −AT −I

A 0 0

I − V 0 V

for the block diagonal matrix V = diag(V1, . . . , Vr) with Vi = P ′Ki

(x∗i − λ∗i ). Therefore,

taking into account that all eigenvalues of the matrix V belong to the interval [0, 1] by

Lemma 2.7, we are able to apply Proposition 3.2 (with V a := I − V and V b := V ) as soon

as we have identified the index sets α, β, γ ⊆ {1, . . . , n}.For each i ∈ JI , we have Vi = I (see (12)) and, therefore, Qi = I and Di = I. Hence

all components j from the block components i ∈ JI belong to the index set γ.

16

On the other hand, for each i ∈ J0, we have Vi = 0 (see (12)), and this corresponds to

Qi = I and Di = 0. Hence all components j from the block components i ∈ J0 belong to

the index set α.

Finally, let i ∈ JB. Then Vi = QiDiQTi with Di = diag(0, ∗, . . . , ∗, 1) and Qi =

(qi, Qi, q′i). Hence the first component for each block index i ∈ JB is an element of the

index set α, the last component for each block index i ∈ JB belongs to the index set γ,

and all the remaining middle components belong to the index set β.

Taking these considerations into account, we immediately see that our assumptions (a)

and (b) correspond precisely to the assumptions (a) and (b) in Proposition 3.2. ¤

We now want to extend Theorem 3.5 to the case where strict complementarity is violated.

Let z∗ = (x∗, µ∗, λ∗) be an arbitary KKT point of the nonlinear SOCP, and let JI , JB, J0

denote the index sets defined by (11). In view of Lemma 3.3, in addition to these sets, we

also need to consider the three index sets

JB0 :={i∣∣ x∗i ∈ bd+Ki, λ

∗i = 0

},

J0B :={i∣∣ x∗i = 0, λ∗i ∈ bd+Ki

}, (16)

J00 :={i∣∣ x∗i = 0, λ∗i = 0

},

which correspond to the block indices where strict complementarity is violated. Note

that these index sets have double subscripts; the first (resp. second) subscript indicates

whether x∗i (resp. λ∗i ) is on the boundary of Ki (excluding zero) or equal to the zero vector.

The following result summarizes the structure of the matrices Vi ∈ ∂BPKi(x∗i − λ∗i ) for

i ∈ JB0 ∪ J0B ∪ J00. Hence it is a counterpart of Lemma 3.4 in the general case.

Lemma 3.6 Let i ∈ JB0∪J0B∪J00 and Vi ∈ ∂BPKi(x∗i−λ∗i ). Then the following statements

hold:

(a) If i ∈ JB0, then we have either Vi = In+1 or Vi = QiDiQTi with Di = diag(0, 1, . . . , 1)

and Qi = (qi, Qi).

(b) If i ∈ J0B, then we have either Vi = 0 or Vi = QiDiQTi with Di = diag(0, . . . , 0, 1)

and Qi = (Qi, q′i).

(c) If i ∈ J00, then we have Vi = In+1 or Vi = 0 or Vi = QiDiQTi with Di and Qi given

by (13) and (14), respectively, or by Di = diag(0, 1, . . . , 1) and Qi = (qi, Qi), or by

Di = diag(0, . . . , 0, 1) and Qi = (Qi, q′i).

17

Proof. First let i ∈ JB0. Then si := x∗i − λ∗i = x∗i ∈ bd+Ki. Therefore, if we write

si = (si0, si), it follows that si0 = ‖si‖ and si 6= 0. Statement (a) then follows immediately

from Lemma 2.6 (b) in combination with Lemma 2.7.

In a similar way, the other two statements can be derived by using Lemma 2.6 (c) and

(d), respectively, together with Lemma 2.7 in order to get the eigenvalues. Here the five

possible choices in statement (c) depend, in particular, on the value of the scalar w0 in

Lemma 2.6 (d) (namely w0 ∈ (−1, 1), w0 = 1, and w0 = −1). ¤

The previous result enables us to generalize Theorem 3.5 to the case where strict com-

plementarity does not hold. Note that, from now on, we use the spectral decompositions

Vi = QiDiQTi and the associated partitionings (13)–(15) for all i ∈ JB, as well as those

specified in Lemma 3.6 for all indices i ∈ JB0 ∪ J0B ∪ J00.

Theorem 3.7 Let z∗ = (x∗, µ∗, λ∗) be a (not necessarily strictly complementary) KKT

point of the nonlinear SOCP and let the (block) index sets JI , JB, J0, JB0, J0B, J00 be defined

by (11) and (16). Suppose that for any partitioning JB0 = J1B0 ∪ J2

B0, any partitioning

J0B = J10B ∪ J2

0B, and any partitioning J00 = J100 ∪ J2

00 ∪ J300 ∪ J4

00 ∪ J500, the following two

conditions hold:

(a) The Hessian ∇2f(x∗) is positive semidefinite on the subspace

S :={d ∈ Rn

∣∣ Ad = 0,

di = 0 ∀i ∈ J0 ∪ J10B ∪ J2

00,

qTi di = 0 ∀i ∈ JB ∪ J2

B0 ∪ J300 ∪ J4

00,

QTi di = 0 ∀i ∈ J2

0B ∪ J500

},

and positive definite on the subspace

S :={d ∈ Rn

∣∣ Ad = 0,

di = 0 ∀i ∈ J0 ∪ J10B ∪ J2

00,

qTi di = 0 ∀i ∈ J2

B0 ∪ J400,

QTi di = 0 ∀i ∈ JB ∪ J2

0B ∪ J300 ∪ J5

00

}

= S ∩ {d ∈ Rn

∣∣ QTi di = 0 ∀i ∈ JB ∪ J3

00

}.

(b) The matrix

(Ai (i ∈ JI ∪ J1

B0 ∪ J100), AiQi (i ∈ JB ∪ J2

B0 ∪ J300 ∪ J4

00), Aiq′i (i ∈ J2

0B ∪ J500)

)

18

has linearly independent rows.

Then all matrices W ∈ ∂BM(z∗) are nonsingular.

Proof. Choose W ∈ ∂BM(z∗) arbitrarily. Then a simple calculation shows that

W =

∇2f(x∗) −AT −I

A 0 0

I − V 0 V

for a suitable block diagonal matrix V = diag(V1, . . . , Vr) with Vi ∈ ∂BPKi(x∗i − λ∗i ). In

principle, the proof is now similar to the one of Theorem 3.5: We want to apply Proposition

3.2 (with V a := I − V and V b := V ) by identifying the index sets α, β, γ. The situation

is, however, more complicated here, since these index sets may depend on the particular

element W chosen from the B-subdifferential ∂BM(z∗). To this end, we take a closer look

especially at the new index sets JB0, J0B, and J00. In view of Lemma 3.6, we further

partition these index sets into

JB0 = J1B0 ∪ J2

B0,

J0B = J10B ∪ J2

0B,

J00 = J100 ∪ J2

00 ∪ J300 ∪ J4

00 ∪ J500

withJ1

B0 := {i |Vi = In+1}, J2B0 := JB0 \ J1

B0,

J10B := {i |Vi = 0}, J2

0B := J0B \ J10B

and

J100 := {i |Vi = In+1},

J200 := {i |Vi = 0},

J300 := {i |Vi = QiDiQ

Ti with Di and Qi given by (13) and (14), respectively},


Ti with Di = diag(0, 1, . . . , 1) and Qi = (qi, Qi)},


Ti with Di = diag(0, . . . , 0, 1) and Qi = (Qi, q′i)}.

Using these definitions and Lemmas 3.4 and 3.6, we see that the following indices j belong

to the index set α in Proposition 3.2:

• All indices j belonging to one of the block indices i ∈ J0 ∪ J10B ∪ J2

00, with Qi = I

being the corresponding orthogonal matrix.

19

• The first component belonging to a block index i ∈ JB ∪J2B0∪J3

00∪J400, with qi being

the first column of the corresponding orthogonal matrix Qi.

• The first ni − 1 components belonging to each block index i ∈ J20B ∪ J5

00, with Qi

consisting of the first ni − 1 columns of the corresponding orthogonal matrix Qi.

We next consider the index set β in Proposition 3.2. In view of Lemmas 3.4 and 3.6, the

following indices j belong to this set:

• All middle indices belonging to a block index i ∈ JB ∪ J300, with Qi consisting of the

middle ni − 2 columns of the corresponding orthogonal matrix Qi.

Using Lemmas 3.4 and 3.6 again, we finally see that the following indices j belong to the

index set γ in Proposition 3.2:

• All indices j belonging to one of the block indices i ∈ JI∪J1B0∪J1

00. The corresponding

orthogonal matrix is Qi = I.

• The last index of each block index i ∈ JB ∪ J20B ∪ J3

00 ∪ J500, with q′i being the last

column of the corresponding orthogonal matrix Qi.

• The last ni− 1 indices j belonging to a block index i ∈ J2B0 ∪ J4

00, with Qi consisting

of the last ni − 1 columns of the corresponding orthogonal matrix Qi.

From these observations, it follows that the condition QTαd = 0 is equivalent to

di = 0 ∀i ∈ J0 ∪ J10B ∪ J2

00,

qTi di = 0 ∀i ∈ JB ∪ J2

B0 ∪ J300 ∪ J4

00,

QTi di = 0 ∀i ∈ J2

0B ∪ J500.

Furthermore, the requirement QTβ d = 0 can be rewritten as

QTi di = 0 ∀i ∈ JB ∪ J3

00.

Since Qi = (qi, Qi) ∈ Rni×(ni−1), the two conditions QTαd = 0 and QT

β d = 0 together become

di = 0 ∀i ∈ J0 ∪ J10B ∪ J2

00,

qTi di = 0 ∀i ∈ J2

B0 ∪ J400,

QTi di = 0 ∀i ∈ JB ∪ J2

0B ∪ J300 ∪ J5

00.

20

This shows that assumption (a) of this theorem coincides with the corresponding assump-

tion in Proposition 3.2.

In a similar way, we can verify the equivalence between assumption (b) in this theorem

and the corresponding condition in Proposition 3.2. In fact, the above identifications of

the index sets β and γ show that the matrix(AQβ, AQγ

)has linearly independent rows

if and only if the matrix

(AiQi (i ∈ JB∪J3

00), Ai (i ∈ JI∪J1B0∪J1

00), Aiq′i (i ∈ JB∪J2

0B∪J300∪J5

00), AiQi (i ∈ J2B0∪J4

00))

has linearly independent rows. Since Qi = (Qi, q′i) ∈ Rni×(ni−1), it is then easy to see

that this condition is equivalent to our assumption (b). Hence the assertion follows from

Proposition 3.2. ¤

Note that, in the case of a strictly complementary KKT point, Theorem 3.7 reduces to

Theorem 3.5. Using Theorems 3.1 and 3.7 along with [18], we get the following result.


point of the nonlinear SOCP, and suppose that the assumptions of Theorem 3.7 hold at this

KKT point. Then the nonsmooth Newton method (5) applied to the system of equations

M(z) = 0 is locally superlinearly convergent. If, in addition, f has a locally Lipschitz

continuous Hessian, then it is locally quadratically convergent.

4 Linear Second-Order Cone Programs

We consider the linear second-order cone program (linear SOCP for short)

min cT x s.t. Ax = b, x ∈ K,

where A ∈ Rm×n, b ∈ Rm, and c ∈ Rn are the given data, and K = K1 × · · · × Kr is the

Cartesian product of second-order cones Ki ⊆ Rni with n1 + · · · + nr = n. Clearly the

linear SOCP is a special case of the nonlinear SOCP discussed in the previous section. In

particular, we may rewrite the corresponding KKT conditions as a system of equations

21

M0(z) = 0, where M0 : Rn × Rm × Rn → Rn × Rm × Rn is defined by

M0(z) := M0(x, µ, λ) :=

c− AT µ− λ

Ax− b

x1 − PK1(x1 − λ1)...

xr − PKr(xr − λr)

. (17)

An immediate consequence of Theorem 3.1 is the fact that the mapping M0 is strongly

semismooth. On the other hand, Theorems 3.5 and 3.7 do not apply to the linear SOCP,

since the second-order conditions do not hold. We therefore need another suitable condition

that guarantees the required nonsingularity of the (generalized) Jacobian of M0 at a KKT

point z∗ = (x∗, µ∗, λ∗). Before presenting such a result, we follow the presentation of the

previous section and first state a general proposition.

Proposition 4.1 Let A ∈ Rm×n. Let V a, V b ∈ Rn×n be two symmetric positive semidef-

inite matrices such that their sum V a + V b is positive definite and V a and V b have a

common basis of eigenvectors, so that there exist an orthogonal matrix Q ∈ Rn×n and

diagonal matrices Da = diag(a1, . . . , an

), Db = diag

(b1, . . . , bn

)satisfying V a = QDaQT ,

V b = QDbQT and aj ≥ 0, bj ≥ 0, aj + bj > 0 for all j = 1, . . . , n. Let the index set

{1, . . . , n} be partitioned as {1, . . . , n} = α ∪ β ∪ γ with

α := {j | aj > 0, bj = 0},β := {j | aj > 0, bj > 0},γ := {j | aj = 0, bj > 0}.

Assume that the following two conditions hold:

(a) The matrix AQγ has full column rank.

(b) The matrix(AQβ, AQγ

)has full row rank.

Then the matrix

W0 :=

0 −AT −I

A 0 0

V a 0 V b

is nonsingular.

22

Proof. Let y =(y(1), y(2), y(3)

) ∈ Rn×Rm×Rn be an arbitrary vector such that W0y = 0.

Then

AT y(2) + y(3) = 0, (18)

Ay(1) = 0, (19)

V ay(1) + V by(3) = 0. (20)

Using the spectral decompositions of V a and V b, equation (20) can be rewritten as

Day(1) + Dby(3) = 0 with y(1) := QT y(1), y(3) := QT y(3).

Taking into account the definitions of the index sets α, β, and γ, we obtain

y(1)α = 0, y(3)

γ = 0, and Daβ y

(1)β + Db

β y(3)β = 0. (21)

Premultiplying (18) by QT , using the definitions of y(1), y(3), and writing down the resulting

equations componentwise for each block corresponding to the indices in α, β, and γ, we get

QTαAT y(2) + y(3)

α = 0, (22)

QTβ AT y(2) + y

(3)β = 0, (23)

QTγ AT y(2) = 0. (24)

Furthermore, equation (19) can be written as

0 = Ay(1) = AQQT y(1) = AQy(1) = AQβ y(1)β + AQγ y

(1)γ . (25)

Using the nonsingularity of the diagonal submatrices Daβ, Db

β, we obtain from (21) and (23)

that

y(1)β = −(Da

β)−1Dbβ y

(3)β = (Da

β)−1DbβQT

β AT y(2).

Substituting this expression for q(1)β in (25) yields

AQβ(Daβ)−1Db

βQTβ AT y(2) + AQγ y

(1)γ = 0.

Premultiplying by (y(2))T and using (24), we have

(y(2))T AQβ(Daβ)−1Db

βQTβ AT y(2) = 0.

However, by the definition of the index set β, the matrix (Daβ)−1Db

β is positive definite.

Consequently, we obtain

QTβ AT y(2) = 0. (26)

23

Then (23) implies y(3)β = 0, and (21) yield y

(1)β = 0. This, in turn, gives AQγ y

(1)γ = 0

by (25). Assumption (a) then gives y(1)γ = 0. Furthermore, (26) and (24) together with

assumption (b) yields y(2) = 0, which also implies y(3)α = 0 by (22). Thus we have y(1) = 0

and y(3) = 0. Then it is easy to deduce that y =(y(1), y(2), y(3)

)= (0, 0, 0), i.e., W0 is

nonsingular. ¤

We now come back to the linear SOCP. First consider the case where z∗ = (x∗, µ∗, λ∗) is a

strictly complementary KKT point, and use the index sets JI , JB, and J0 defined by (11)

in order to partition any given vector in Rn in a suitable way. Let Vi := P ′Ki

(x∗i − λ∗i ).

Similarly to the discussion in the previous section, we have

Vi = I ∀i ∈ JI and Vi = 0 ∀i ∈ J0. (27)

Moreover, for each i ∈ JB, let Vi = QiDiQTi be the spectral decomposition of the matrix

Vi. For the matrices Di and Qi, i ∈ JB, we use the same decomposition as in (13)–(15).

Using this notation, we are now in a position to state the following nonsingularity

result which may be viewed as a counterpart of the corresponding result for interior-point

methods given in [1, Theorem 28]. Recall that Ai denotes the submatrix of A consisting

of those columns which correspond to the block index i.

Theorem 4.2 Let z∗ = (x∗, µ∗, λ∗) be a strictly complementary KKT point of the linear

SOCP. Suppose further that the following conditions hold:

(a) The matrix(Ai (i ∈ JI), Aiq

′i (i ∈ JB)

)has linearly independent columns.

(b) The matrix(Ai (i ∈ JI), AiQi (i ∈ JB)

)has linearly independent rows.

Then the Jacobian M ′0(z

∗) exists and is nonsingular.

Proof. We have

M ′0(z

∗) =

0 −AT −I

A 0 0

I − V 0 V

for a block diagonal matrix V = diag(V1, . . . , Vr

)with Vi := P ′

Ki(x∗i −λ∗i ). Let V a := I−V

and V b := V . It then follows that V a, V b are both symmetric positive semidefinite with

a positive definite sum V a + V b, and these two matrices obviously have a common basis

of eigenvectors. Hence we are in the situation of Proposition 4.1. In order to apply this

24

result, we need to take a closer look at the orthogonal matrix and the special structure of

the eigenvalues in the case that is currently under consideration.

We have

V a = Q(I −D)QT and V b = QDQT

with the block matrices

Q := diag(Q1, . . . , Qr

)and D := diag

(D1, . . . , Dr

),

where the orthogonal blocks Qi are identity matrices for all indices i ∈ JI ∪ J0, cf. (27),

whereas for all i ∈ JB, these orthogonal matrices are given by (14), or (15), and the

corresponding diagonal matrices Di are given by (13).

In order to apply Proposition 4.1, we need to identify the index sets β and γ. From

(27) and (13), we see that the following indices comprise the index set β:

• all middle indices j belonging to a block index i ∈ JB, with Qi consisting of the

middle ni − 2 columns of the corresponding orthogonal matrix Qi.

Similarly, we see that the following indices comprise the index set γ:

• all indices j belonging to a block index i ∈ JI , with Qi = I being the corresponding

orthogonal matrix.

• the last component j of each block index i ∈ JB, with q′i being the last column of the

corresponding orthogonal matrix Qi.

Using these identifications, it is easy to see that assumptions (a) and (b) in Proposition

4.1 are identical to the two conditions in this theorem. Consequently the assertion follows

from Proposition 4.1. ¤

We next generalize Theorem 4.2 to the case where strict complementarity is not satis-

fied. In addition to JI , JB, J0, we use the index sets JB0, J0B, J00 defined by (16).


point of the linear SOCP. Suppose that for any partitioning JB0 = J1B0 ∪ J2

B0, any subset

J20B ⊆ J0B, and any mutually disjoint subsets J1

00, J300, J

400, J

500 ⊆ J00, the following two

conditions hold:

25

(a) The matrix

(Ai (i ∈ JI ∪ J1

B0 ∪ J100), AQi (i ∈ J2

B0 ∪ J400), Aiq

′i (i ∈ JB ∪ J2

0B ∪ J300 ∪ J5

00))

has linearly independent columns.

(b) The matrix

(Ai (i ∈ JI ∪ J1

B0 ∪ J100), AiQi (i ∈ JB ∪ J2

B0 ∪ J300 ∪ J4

00), Aiq′i (i ∈ J2

0B ∪ J500)

)

has linearly independent rows.

Then all matrices W0 ∈ ∂BM0(z∗) are nonsingular.

Proof. Choose W0 ∈ ∂BM0(z∗) arbitrarily. Then

W0 =

0 −AT −I

A 0 0

I − V 0 V

for a block diagonal matrix V = diag(V1, . . . , Vr

)with Vi ∈ ∂BPKi

(x∗i − λ∗i ). Following the

proof of Theorem 4.2, we define V a := I −V and V b := V and see that these two matrices

satisfy the assumptions of Proposition 4.1. Similarly to the proof of Theorem 4.2, we also

have

V a = Q(I −D)QT and V b = QDQT

with the block matrices

Q := diag(Q1, . . . , Qr

)and D := diag

(D1, . . . , Dr

).

Then we need to identify the index sets α, β and γ in Proposition 4.1. To this end, we

follow the proof of Theorem 3.7 and partition the index sets JB0, J0B, and J00 further into

certain subsets in a similar manner. Then, as in the proof of Theorem 3.7, we are able

to show which indices j belong to the sets α, β, and γ. Moreover, it is not difficult to

see that the two assumptions (a) and (b) in this theorem correspond precisely to the two

requirements (a) and (b) in Proposition 4.1. We leave the details to the reader (note that

assumption (b) coincides with the corresponding assumption in Theorem 3.7, so there is

nothing to verify for this part). ¤

Note that a corresponding result for interior-point methods does not hold since the Jacobian

matrices arising in that context are singular whenever strict complementarity does not hold.

As a consequence of Theorem 4.3 and [19], we obtain the following theorem.

26


point of the linear SOCP, and suppose that the assumptions of Theorem 4.3 hold at this

KKT point. Then the nonsmooth Newton method zk+1 := zk − W−1k M0(z

k), with Wk ∈∂BM0(z

k), applied to the system of equations M0(z) = 0 is locally quadratically convergent.

5 Final Remarks

We have investigated the local properties of a semismooth equation reformulation of both

the linear and the nonlinear second-order cone programs. In particular, we have shown

nonsingularity results that provide basic conditions for local quadratic convergence of a

nonsmooth Newton method. Strict complementarity of a solution is not needed in our

nonsingularity results. Apart from these local properties, it is certainly of interest to see

how the local Newton method can be globalized in a suitable way. We leave it as a future

research topic.

References

[1] F. Alizadeh and D. Goldfarb: Second-order cone programming. Mathematical

Programming 95, 2003, pp. 3–51.

[2] A. Ben-Tal and A. Nemirovski: Lectures on Modern Convex Optimization. MPS-

SIAM Series on Optimization, SIAM, Philadelphia, PA, 2001.

[3] J.F. Bonnans and H. Ramırez C.: Perturbation analysis of second-order cone

programming problems. Mathematical Programming 104, 2005, pp. 205–227.

[4] S. Boyd and L. Vandenberghe: Convex Optimization. Cambridge University

Press, Cambridge, 2004.

[5] X.D. Chen, D. Chen, and J. Sun: Complementarity functions and numerical

experiments for second-order cone complementarity problems. Computational Opti-

mization and Applications 25, 2003, pp. 39–56.

[6] J.-S. Chen: Alternative proofs for some results of vector-valued functions associated

with second-order cones. Journal of Nonlinear and Convex Analysis 6, 2005, pp. 297–

325.

27

[7] J.-S. Chen, X. Chen, and P. Tseng: Analysis of nonsmooth vector-valued func-

tions associated with second-order cones. Mathematical Programming 101, 2004, pp.

95–117.

[8] J.-S. Chen and P. Tseng: An unconstrained smooth minimization reformulation

of the second-order cone complementarity problem. Mathematical Programming 104,

2005, pp. 293–327.

[9] F.H. Clarke: Optimization and Nonsmooth Analysis. John Wiley & Sons, New

York, NY, 1983 (reprinted by SIAM, Philadelphia, PA, 1990).

[10] F. Facchinei and J.S. Pang: Finite-Dimensional Variational Inequalities and

Complementarity Problems, Volume II. Springer, New York, NY, 2003.

[11] M. Fukushima, Z.-Q. Luo, and P. Tseng: Smoothing functions for second-order-

cone complementarity problems. SIAM Journal on Optimization 12, 2001, pp. 436–460.

[12] S. Hayashi, N. Yamashita, and M. Fukushima: A combined smoothing and reg-

ularization method for monotone second-order cone complementarity problems. SIAM

Journal on Optimization 15, 2005, pp. 593–615.

[13] C. Helmberg: http://www-user.tu-chemnitz.de/~helmberg/semidef.html.

[14] H. Kato and M. Fukushima: An SQP-type algorithm for nonlinear second-order

cone programs. Technical Report 2006-004, Department of Applied Mathematics and

Physics, Kyoto University, March 2006.

[15] M.S. Lobo, L. Vandenberghe, and S. Boyd: SOCP – Software for second-

order cone programming. User’s guide. Technical Report, Department of Electrical

Engineering, Stanford University, April 1997.

[16] M.S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret: Applications of

second-order cone programming. Linear Algebra and Its Applications 284, 1998, pp.

193–228.

[17] J.-S. Pang and L. Qi: Nonsmooth equations: motivation and algorithms. SIAM

Journal on Optimization 3, 1993, pp. 443–465.

[18] L. Qi: Convergence analysis of some algorithms for solving nonsmooth equations.

Mathematics of Operations Research 18, 1993, pp. 227–244.

28

[19] L. Qi and J. Sun: A nonsmooth version of Newton’s method. Mathematical Pro-

gramming 58, 1993, pp. 353–367.

[20] J.F. Sturm: Using SeDuMi 1.02, a MATLAB toolbox for optimization over

symmetric cones. Optimization Methods and Software 11/12, 1999, pp. 625–653.

http://sedumi.mcmaster.ca/

[21] D. Sun and J. Sun: Strong semismoothness of the Fischer-Burmeister SDC and

SOC complementarity functions. Mathematical Programming 103, 2005, pp. 575–581.

[22] P. Tseng: Smoothing methods for second-order cone programs/complementarity

problems. Talk presented at the SIAM Conference on Optimization, Stockholm, May

2005.

[23] R.H. Tutuncu, K.C. Toh and M.J. Todd: Solving semidefinite-quadratic-

linear programs using SDPT3. Mathematical Programming 95, 2003, pp. 189–217.

http://www.math.nus.edu.sg/ mattohkc/sdpt3.html

[24] R.J. Vanderbei and H. Yuritan: Using LOQO to solve second-order cone pro-

gramming problems. Technical Report, Statistics and Operations Research, Princeton

University, 1998.

[25] H. Yamashita and H. Yabe: A primal-dual interior point method for nonlinear

optimization over second order cones. Technical Report, Mathematical Systems Inc.,

Tokyo, May 2005 (revised February 2006).

29

SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR · Abstract The optimality conditions of a nonlinear...

Documents

Transcript of SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR · Abstract The optimality conditions of a nonlinear...