SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR · Abstract The optimality conditions of a nonlinear...
Transcript of SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR · Abstract The optimality conditions of a nonlinear...
SEMISMOOTH METHODS FOR LINEAR AND NONLINEAR
SECOND-ORDER CONE PROGRAMS1
Christian Kanzow2 and Masao Fukushima3
April 19, 2006
Abstract
The optimality conditions of a nonlinear second-order cone program can be refor-mulated as a nonsmooth system of equations using a projection mapping. This al-lows the application of nonsmooth Newton methods for the solution of the nonlinearsecond-order cone program. Conditions for the local quadratic convergence of thesenonsmooth Newton methods are investigated. Related conditions are also given forthe special case of a linear second-order cone program. An interesting and importantfeature of these conditions is that they do not require strict complementarity of thesolution.
Key Words: Linear second-order cone program, nonlinear second-order cone program,
semismooth function, nonsmooth Newton method, quadratic convergence
1This work was supported in part by the Scientific Research Grant-in-Aid from Japan Society for thePromotion of Science.
2Institute of Mathematics, University of Wurzburg, Am Hubland, 97074 Wurzburg, Germany, e-mail:[email protected]
3Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University,Kyoto 606-8501, Japan, e-mail: [email protected]
1 Introduction
We consider both the linear second-order cone program
min cT x s.t. Ax = b, x ∈ K,
and the nonlinear second-order cone program
min f(x) s.t. Ax = b, x ∈ K,
where f : Rn → R is a twice continuously differentiable function, A ∈ Rm×n is a given
matrix, b ∈ Rm and c ∈ Rn are given vectors, and
K = K1 × · · · × Kr
is a Cartesian product of second-order cones Ki ⊆ Rni , n1 + · · · + nr = n. Recall that the
second-order cone (or ice-cream cone or Lorentz cone) of dimension ni is defined by
Ki :={zi = (zi0, zi) ∈ R× Rni−1
∣∣ zi0 ≥ ‖zi‖},
where ‖ · ‖ denotes the Euclidean norm. Observe the special notation that is used in the
definition of Ki and that will be applied throughout this manuscript: For a given vector
z ∈ R` for some ` ≥ 1, we write z = (z0, z), where z0 is the first component of the vector
z, and z consists of the remaining `− 1 components of z.
The linear second-order cone program has been investigated in many previous works,
and we refer the interested reader to the two survey papers [16, 1] and the books [2, 4]
for many important applications and theoretical properties. Software for the solution of
linear second-order cone programs is also available, see, for example, [15, 24, 20, 23]. In
many cases, the linear second-order cone program is viewed as a special case of a (linear)
semidefinite program (see [1] for a suitable reformulation). However, we feel that the
second-order cone program should be treated directly since the reformulation of a second-
order cone constraint as a semidefinite constraint increases the dimension of the problem
significantly and, therefore, decreases the efficiency of any solver. In fact, many solvers for
semidefinite programs (see, for example, the list given on Helmberg’s homepage [13]) are
able to deal with second-order cone constraints separately.
The treatment of the nonlinear second-order cone program is much more recent, and,
in the moment, the number of publications is rather limited, see [3, 5, 6, 7, 8, 11, 12, 14,
21, 22, 25]. These papers deal with different topics; some of them investigate different
2
kinds of solution methods (interior-point methods, smoothing methods, SQP-type meth-
ods, or methods based on unconstrained optimization), while some of them consider certain
theoretical properties or suitable reformulations of the second-order cone program.
The method of choice for the solution of (at least) the linear second-order cone program
is currently an interior-point method. However, some recent preliminary tests indicate that
the class of smoothing or semismooth methods is sometimes superior to the class of interior-
point methods, especially for nonlinear problems, see [8, 12, 22]. On the other hand, the
theoretical properties of interior-point methods are much better understood than those of
the smoothing and semismooth methods.
The aim of this paper is to provide some results which help to understand the the-
oretical properties of semismooth methods being applied to both linear and nonlinear
second-order cone programs. The investigation here is of local nature, and we provide
sufficient conditions for those methods to be locally quadratically convergent. An interest-
ing and important feature of those sufficient conditions is that they do not require strict
complementarity of the solution. This is an advantage compared to interior-point methods
where singular Jacobians occur if strict complementarity is not satisfied.
The paper is organized as follows: Section 2 states a number of preliminary results
for a projection mapping onto a second-order cone, which will later be used in order to
reformulate the optimality conditions of the second-order cone program as a system of
equations. Section 3 then investigates the nonlinear second-order cone program, whereas
its linear counterpart is discussed in Section 4. We close with some final remarks in
Section 5.
Most of our notation is standard. For a differentiable mapping G : Rn → Rm, we denote
by G′(z) ∈ Rm×n the Jacobian of G at z. If G is locally Lipschitz continuous, the set
∂BG(z) :={H ∈ Rm×n
∣∣∃{zk} ⊆ DG : zk → z, G′(zk) → H}
is nonempty and called the B-subdifferential of G at z, where DG ⊆ Rn denotes the
set of points at which G is differentiable. The convex hull ∂G(z) := conv∂BG(z) is the
generalized Jacobian of Clarke [9]. We assume that the reader is familiar with the concepts
of (strongly) semismooth functions, and refer to [19, 18, 17, 10] for details.
3
2 Projection Mapping onto Second-Order Cone
Throughout this section, let K be the single second-order cone
K :={z = (z0, z) ∈ R× Rn
∣∣ z0 ≥ ‖z‖}.
In the subsequent sections, K will be the Cartesian product of second-order cones. The
results of this section will later be applied componentwise to each of the second-order cones
Ki in the Cartesian product.
Recall that the second-order cone K is self-dual, i.e. K∗ = K, where K∗ := {d ∈R × Rn | zT d ≥ 0 ∀z ∈ K}
denotes the dual cone of K, cf. [1, Lemma 1]. Hence the
following result holds, see, e.g., [11, Proposition 4.1].
Lemma 2.1 The following equivalence holds:
x ∈ K, y ∈ K, xT y = 0 ⇐⇒ x− PK(x− y) = 0,
where PK(z) denotes the (Euclidean) projection of a vector z on K.
An explicit representation of the projection PK(z) is given in the following result, see [11,
Proposition 3.3].
Lemma 2.2 For any given z = (z0, z) ∈ R× Rn, we have
PK(z) = max{0, λ1}u(1) + max{0, λ2}u(2),
where λ1, λ2 are the spectral values and u(1), u(2) are the spectral vectors of z, respectively,
given by
λ1 := z0 − ‖z‖, λ2 := z0 + ‖z‖,
u(1) :=
1
2
(1
− z‖z‖
)if z 6= 0,
1
2
(1
−w
)if z = 0,
u(2) :=
1
2
(1z‖z‖
)if z 6= 0,
1
2
(1
w
)if z = 0,
where w is any vector in Rn with ‖w‖ = 1.
It is well-known that the projection mapping onto an arbitrary closed convex set is non-
expansive and hence is Lipschitz continuous. When the set is the second-order cone K, a
stronger smoothness property can be shown, see [12, Proposition 4.5].
4
Lemma 2.3 The projection mapping PK is strongly semismooth.
We next characterize the points at which the projection mapping PK is differentiable.
Lemma 2.4 The projection mapping PK is differentiable at a point z = (z0, z) ∈ R×Rn if
and only if z0 6= ±‖z‖ holds. In fact, the projection mapping is continuously differentiable
at every z such that z0 6= ±‖z‖.
Proof. The statement can be derived directly from the representation of PK(z) given in
Lemma 2.2. Alternatively, it can be derived as a special case of more general results stated
in [7], see, in particular, Propositions 4 and 5 in that reference. ¤
We next calculate the Jacobian of the projection mapping PK at a point where it is differ-
entiable.
Lemma 2.5 The Jacobian of PK at a point z = (z0, z) ∈ R×Rn with z0 6= ±‖z‖ is given
by
P ′K(z) =
0, if z0 < −‖z‖,In+1, if z0 > +‖z‖,1
2
(1 wT
w H
), if − ‖z‖ < z0 < +‖z‖,
where
w :=z
‖z‖ , H :=( z0
‖z‖ + 1)In − z0
‖z‖wwT .
(Note that the denominator is automatically nonzero in this case.)
Proof. First assume that z 6= 0 holds.
Consider the case z0 < −‖z‖. This implies not only z0 + ‖z‖ < 0 but also z0−‖z‖ < 0.
Therefore, in view of Lemma 2.2, it follows that PK is the zero mapping in a neighborhood
of the given point z. Hence the Jacobian at z is the zero matrix.
Next consider the case z0 > +‖z‖. Then z belongs to the interior of the cone K and,
therefore, PK is the identity mapping in a small neighborhood of z (this can also be verified
by direct calculation using Lemma 2.2 once again). Hence the Jacobian at z is the identity
matrix.
5
Now let z0 ∈ (−‖z‖, +‖z‖). Then z0 + ‖z‖ > 0 and z0 − ‖z‖ < 0. Consequently, using
Lemma 2.2 again, we have (locally)
PK(z) =1
2
(z0 + ‖z‖)
(1z‖z‖
).
Let us write
θ(z) :=z
‖z‖ .
Since z 6= 0, it is easy to see that θ is differentiable at z with Jacobian
θ′(z) =1
‖z‖2
(‖z‖In − 1
‖z‖ zzT).
Hence we obtain
P ′K(z) =
1
2
(z0 + ‖z‖)
(0 0
0 θ′(z)
)+
1
2
(1z‖z‖
) (1,
zT
‖z‖)
=1
2
(z0 + ‖z‖)
(0 0
0 1‖z‖2
(‖z‖In − 1‖z‖ zzT
))
+1
2
(1 zT
‖z‖z‖z‖
zzT
‖z‖2
)
=1
2
(1 zT
‖z‖z‖z‖
(z0
‖z‖ + 1)In − z0
‖z‖zzT
‖z‖2
),
which is precisely the statement in this case.
Finally, let us consider the case where z = 0. Then the statement is a special case of
[11, Proposition 5.2]. However, since PK is continuously differentiable at (z0, 0) for any
z0 6= 0 by Lemma 2.4, the Jacobian at (z0, z) can also be derived by taking a limit of a
sequence {zk} = {(zk0 , z
k)} converging to z = (z0, 0) and satisfying zk 6= 0. ¤
Based on the previous results, we are now in a position to give an expression for the ele-
ments of the B-subdifferential ∂BPK(z) at an arbitary point z. A similar representation of
the elements of the Clarke generalized Jacobian ∂PK(z) is given in [12, Proposition 4.8],
without discussion on conditions for the nonsingularity of those matrices. Here we prefer
to deal with the smaller set ∂BPK(z) since this will simplify our subsequent analysis to give
sufficient conditions for the nonsingularity of all elements in ∂BPK(z). In fact, the non-
singularity of all elements of the B-subdifferential usually hold under weaker assumptions
than the nonsingularity of all elements of the corresponding Clarke generalized Jacobian.
Lemma 2.6 Given a general point z = (z0, z) ∈ R × Rn, each element V ∈ ∂BPK(z) has
the following representation:
6
(a) If z0 6= ±‖z‖, then PK is continuously differentiable at z and V = P ′K(z) with the
Jacobian P ′K(z) given in Lemma 2.5.
(b) If z 6= 0 and z0 = +‖z‖, then
V ∈{
In+1,1
2
(1 wT
w H
)},
where w := z‖z‖ and H := 2In − wwT .
(c) If z 6= 0 and z0 = −‖z‖, then
V ∈{
0,1
2
(1 wT
w H
)},
where w := z‖z‖ and H := wwT .
(d) If z = 0 and z0 = 0, then either V = 0 or V = In+1 or V belongs to the set{
1
2
(1 wT
w H
) ∣∣∣ H = (w0 + 1)In − w0wwT for some |w0| ≤ 1 and ‖w‖ = 1
}.
Proof. Throughout the proof, we denote by D the set of points where the mapping PKis differentiable. Recall that this set is characterized in Lemma 2.4.
(a) Under the stated assumptions on (z0, z), it follows from Lemma 2.4 that PK is continu-
ously differentiable at z. Hence the B-subdifferential consists of one element only, namely
the Jacobian P ′K(z). This observation gives the first statement.
(b) Let z 6= 0 and z0 = +‖z‖. Furthermore, let {zk} ⊆ D be an arbitrary sequence
converging to z. Writing zk = (zk0 , z
k), we can assume without loss of generality that, for
each k ∈ N, we have either −‖zk‖ < zk0 < ‖zk‖ or zk
0 > ‖zk‖. If zk0 > ‖zk‖ for all k ∈ N,
we have P ′K(zk) = In+1 by Lemma 2.5 and, therefore, P ′
K(zk) → In+1 as k → ∞. On the
other hand, if −‖zk‖ < zk0 < ‖zk‖ for all k ∈ N, Lemma 2.5 shows that
P ′K(zk) =
1
2
(1 (wk)T
wk Hk
)
for all k ∈ N, where
wk :=zk
‖zk‖ and Hk :=( zk
0
‖zk‖ + 1)In − zk
0
‖zk‖wk(wk)T . (1)
7
Taking the limit k →∞, we therefore obtain
P ′K(zk) → 1
2
(1 wT
w H
)with w :=
z
‖z‖ , H := 2In − wwT .
Finally, if the sequence {zk} is such that zk0 > ‖zk‖ for some k ∈ N and −‖zk‖ < zk
0 < ‖zk‖for the remaining k ∈ N, we do not get any other limiting elements for P ′
K(zk).
(c) Let z 6= 0 and z0 = −‖z‖. Moreover, let {zk} ⊆ D be any sequence converging to z.
Writing zk = (zk0 , z
k) for each k ∈ N, we may assume without loss of generality that, for
each k ∈ N, we have either zk0 < −‖zk‖ or −‖zk‖ < zk
0 < ‖zk‖. By the same reason as in
the proof of part (b), it suffices to consider the two cases: (i) zk0 < −‖zk‖ for all k ∈ N,
and (ii) −‖zk‖ < zk0 < ‖zk‖ for all k ∈ N. In the former case, it follows from Lemma 2.5
that P ′K(zk) = 0 for all k ∈ N, hence P ′
K(zk) → 0 as k → ∞. In the latter case, it also
follows from Lemma 2.5 that
P ′K(zk) =
1
2
(1 (wk)T
wk Hk
)
with wk, Hk given by (1). Taking the limit k →∞ gives
P ′K(zk) → 1
2
(1 wT
w H
)with w :=
z
‖z‖ , H := wwT .
(d) Let z = 0 and z0 = 0. Let {zk} = {(zk0 , z
k)} ⊆ D be any sequence converging to
z = (z0, z) = (0, 0). Then there are three possibilities: (i) zk0 < −‖zk‖, in which case we
have P ′K(zk) = 0 by Lemma 2.5, (ii) zk
0 > ‖zk‖, in which case we have P ′K(zk) = In+1 by
Lemma 2.5, and (iii) −‖zk‖ < zk0 < ‖zk‖, in which case we have
P ′K(zk) =
1
2
(1 (wk)T
wk Hk
)
with wk, Hk given by (1), again by Lemma 2.5. Taking the limit (possibly on a subsequence)
in each of these cases, we have either P ′K(zk) → 0 or P ′
K(zk) → In+1 or
P ′K(zk) → 1
2
(1 wT
w H
)with H := (w0 + 1)In − w0wwT
for some vector w = (w0, w) satisfying |w0| ≤ 1 and ‖w‖ = 1. This completes the proof. ¤
8
We can summarize Lemma 2.6 as follows: Any element V ∈ ∂BPK(z) is equal to
V = 0 or V = In+1 or V =1
2
(1 wT
w H
)(2)
for some vector w ∈ Rn with ‖w‖ = 1 and some matrix H ∈ Rn×n of the form H =
(1 + α)In − αwwT with some scalar α ∈ R satisfying |α| ≤ 1. Specifically, in statements
(a)–(c) we have w = z/‖z‖, whereas in statement (d), w can be any vector of length one.
Moreover, we have α = z0/‖z‖ in statement (a), α = 1 in statement (b), α = −1 in
statement (c), whereas there is no further specification of α in statement (d) (here the two
simple cases V = 0 and V = In+1 are always excluded).
The eigenvalues of any matrix V ∈ ∂BPK(z) can be given explicitly, as shown in the
following result.
Lemma 2.7 Let z = (z0, z) ∈ R × Rn and V ∈ ∂BPK(z). Assume that V 6∈ {0, In+1} so
that V has the third representation in (2) with H = (1 + α)In − αwwT for some scalar
α ∈ [−1, +1] and some vector w ∈ Rn satisfying ‖w‖ = 1. Then V has the two single
eigenvalues λ = 0 and λ = 1 as well as the eigenvalue λ = 12(α + 1) with multiplicity
n − 1 (unless α = ±1, where the multiplicities change in an obvious way). Moreover, the
corresponding eigenvectors of V are given by
(−1
w
),
(1
w
), and
(0
vj
), j = 1, . . . , n− 1,
where v1, . . . , vn−1 are arbitrary vectors that span the linear subspace {v ∈ Rn | vT w = 0}.
Proof. By assumption, we have
V =1
2
(1 wT
w H
)with H = (1 + α)In − αwwT
for some α ∈ [−1, +1] and some vector w satisfying ‖w‖ = 1. Now take an arbitrary vector
v ∈ Rn orthogonal to w, and let u = (0, vT )T . Then an elementary calculation shows that
V u = λu holds for λ = 12(α+1). Hence this λ is an eigenvalue of V with multiplicity n− 1
since we can choose n− 1 linearly independent vectors v ∈ Rn such that vT w = 0. On the
other hand, if λ = 0, it is easy to see that V u = λu holds with u = (−1, wT )T , whereas for
λ = 1 we have V u = λu by taking u = (1, wT )T . This completes the proof. ¤
9
Note that Lemma 2.7 particularly implies λ ∈ [0, 1] for all eigenvalues λ of V . This
fact can alternatively be derived from the fact that PK is a projection mapping, without
referring to the explicit representation of V as given in Lemma 2.6.
We close this section by pointing out an interesting relation between the matrix V ∈∂BPK(z) and the so-called arrow matrix
Arw(z) :=
(z0 zT
z z0In
)∈ R(n+1)×(n+1)
associated with z = (z0, z) ∈ R × Rn, which frequently occurs in the context of interior-
point methods and in the analysis of second-order cone problems, see, e.g., [1]. To this
end, consider the case where PK is differentiable at z, excluding the two trivial cases where
P ′K(z) = 0 and P ′
K(z) = In+1, cf. Lemma 2.5. Then by Lemma 2.7, the eigenvectors of the
matrix V = P ′K(z) are given by
(−1
z‖z‖
),
(1z‖z‖
), and
(0
vj
), j = 1, . . . , n− 1, (3)
where v1, . . . , vn−1 comprise an orthogonal basis of the linear subspace {v ∈ Rn | vT z = 0}.However, an elementary calculation shows that these are also the eigenvectors of the arrow
matrix Arw(z), with corresponding single eigenvalues λ1 = z0 − ‖z‖, λ2 = z0 + ‖z‖ and
the multiple eigenvalues λi = z0, i = 3, . . . , n + 1. Therefore, although the eigenvalues of
V = P ′K(z) and Arw(z) are different, both matrices have the same set of eigenvectors.
3 Nonlinear Second-Order Cone Programs
In this section, we consider the nonlinear second-order cone program (nonlinear SOCP for
short)
min f(x) s.t. Ax = b, x ∈ K,
where f : Rn → R is a twice continuously differentiable function, A ∈ Rm×n is a given
matrix, b ∈ Rm is a given vector, and K = K1 × · · · × Kr is the Cartesian product of
second-order cones Ki ⊆ Rni with n1 + · · ·+ nr = n. Here we are particularly interested in
the case where the objective function f is nonlinear. The special case where f is a linear
function will be discussed in more detail in the subsequent section.
10
Under certain conditions like convexity of f and a Slater-type constraint qualification
[4], solving the nonlinear SOCP is equivalent to solving the corresponding KKT conditions,
which can be written as follows:
∇f(x)− AT µ− λ = 0,
Ax = b,
xi ∈ Ki, λi ∈ Ki, xTi λi = 0 i = 1, . . . , r.
Using Lemma 2.1, it follows that these KKT conditions are equivalent to the system of
equations M(z) = 0, where M : Rn × Rm × Rn → Rn × Rm × Rn is defined by
M(z) := M(x, µ, λ) :=
∇f(x)− AT µ− λ
Ax− b
x1 − PK1(x1 − λ1)...
xr − PKr(xr − λr)
. (4)
Then we can apply the nonsmooth Newton method [18, 19, 17]
zk+1 := zk −W−1k M(zk), Wk ∈ ∂BM(zk), k = 0, 1, 2, . . . , (5)
to the system of equations M(z) = 0 in order to solve the nonlinear SOCP or, at least, the
corresponding KKT conditions. Our aim is to show fast local convergence of this iterative
method. In view of the results in [19, 18], we have to guarantee that, on one hand, the
mapping M , though not differentiable everywhere, is still sufficiently ‘smooth’, and, on the
other hand, it satisfies a local nonsingularity condition under suitable assumptions.
The required smoothness property of M is summarized in the following result.
Theorem 3.1 The mapping M defined by (4) is semismooth. Moreover, if the Hessian
∇2f is locally Lipschitz continuous, then the mapping M is strongly semismooth.
Proof. Note that a continuously differentiable mapping is semismooth. Moreover, if the
Jacobian of a differentiable mapping is locally Lipschitz continuous, then this mapping is
strongly semismooth. Now Lemma 2.3 and the fact that a given mapping is (strongly)
semismooth if and only if all component functions are (strongly) semismooth yield the
desired result. ¤
11
Our next step is to provide suitable conditions which guarantee the nonsingularity of all
elements of the B-subdifferential of M at a KKT point. This requires some more work,
and we begin with the following general result.
Proposition 3.2 Let H ∈ Rn×n be symmetric, and A ∈ Rm×n. Let V a, V b ∈ Rn×n be two
symmetric positive semidefinite matrices such that their sum V a + V b is positive definite
and V a and V b have a common basis of eigenvectors, so that there exist an orthogonal
matrix Q ∈ Rn×n and diagonal matrices Da = diag(a1, . . . , an
)and Db = diag
(b1, . . . , bn
)
satisfying V a = QDaQT , V b = QDbQT as well as aj ≥ 0, bj ≥ 0 and aj + bj > 0 for all
j = 1, . . . , n. Let the index set {1, . . . , n} be partitioned as {1, . . . , n} = α ∪ β ∪ γ, where
α := {j | aj > 0, bj = 0},β := {j | aj > 0, bj > 0},γ := {j | aj = 0, bj > 0},
and let Qα, Qβ, and Qγ denote the submatrices of Q consisting of the columns from Q
corresponding to the index sets α, β, and γ, respectively. Assume that the following two
conditions hold:
(a) The matrix H is positive semidefinite on the subspace Sα := {d ∈ Rn |Ad = 0,
QTαd = 0}, and positive definite on the subspace Sα,β := {d ∈ Rn |Ad = 0, QT
αd = 0,
QTβ d = 0} = Sα ∩ {d ∈ Rn |QT
β d = 0}.
(b) The matrix(AQβ, AQγ
)has full row rank.
Then the matrix
W :=
H −AT −I
A 0 0
V a 0 V b
is nonsingular.
Proof. Let y =(y(1), y(2), y(3)
) ∈ Rn × Rm × Rn be any vector such that Wy = 0. Then
Hy(1) − AT y(2) − y(3) = 0, (6)
Ay(1) = 0, (7)
V ay(1) + V by(3) = 0. (8)
Using the spectral decompositions of V a and V b, equation (8) can be rewritten as
Day(1) + Dby(3) = 0 with y(1) := QT y(1), y(3) := QT y(3).
12
In view of the definitions of the index sets α, β, and γ, this implies
y(1)α = 0, y(3)
γ = 0, and Daβ y
(1)β + Db
β y(3)β = 0. (9)
This shows that
(y(1))T y(3) = (y(1))T QQT y(3) = (y(1))T y(3) = (y(1)β )T y
(3)β = −(y
(1)β )T
(Db
β
)−1Da
β y(1)β .
Therefore, premultiplying (6) by (y(1))T , we obtain from (7) that
(y(1))T Hy(1) = (y(1))T y(3) = −(y(1)β )T
(Db
β
)−1Da
β y(1)β ≤ 0, (10)
and the inequality is strict whenever y(1)β 6= 0. However, in view of (7) and 0 = y
(1)α =
QTαy(1), cf. (9), we obtain from the positive semidefiniteness of H on the subspace Sα that
y(1)β = 0, i.e., QT
β y(1) = 0. This in turn gives y(3)β = 0 by (9). Using (10) and the fact that
y(1) ∈ Sα,β, we then obtain y(1) = 0 from the assumed positive definiteness of H on the
subspace Sα,β. Now premultiplying (6) by QT and writing down the corresponding block
equations for the β and γ blocks, we obtain from y(1) = 0, y(3)β = 0, and y
(3)γ = 0 that
QTβ AT y(2) = 0 and QT
γ AT y(2) = 0.
Hence y(2) = 0 follows from Assumption (b). Finally, by (6), we have y(3) = 0. ¤
In order to apply Proposition 3.2 to the (generalized) Jacobian of the mapping M at a
KKT point, we first introduce some more notation:
intKi :={xi
∣∣xi0 > ‖xi‖}
denotes the interior of Ki,
bdKi :={xi
∣∣xi0 = ‖xi‖}
denotes the boundary of Ki, and
bd+Ki := bdKi \ {0} is the boundary of Ki excluding the origin.
We also call a KKT point z∗ = (x∗, µ∗, λ∗) of the nonlinear SOCP strictly complementary
if x∗i + λ∗i ∈ intKi holds for all block components i = 1, . . . , r. This notation enables us to
restate the following result from [1].
Lemma 3.3 Let z∗ = (x∗, µ∗, λ∗) be a KKT point of the nonlinear SOCP. Then precisely
one of the following six cases holds for each block pair (x∗i , λ∗i ), i = 1, . . . , r:
13
x∗i λ∗i SC
x∗i ∈ intKi λ∗i = 0 yes
x∗i = 0 λ∗i ∈ intKi yes
x∗i ∈ bd+Ki λ∗i ∈ bd+Ki yes
x∗i ∈ bd+Ki λ∗i = 0 no
x∗i = 0 λ∗i ∈ bd+Ki no
x∗i = 0 λ∗i = 0 no
The last column in the table indicates whether or not strict complementarity (SC) holds.
We also need the following simple result which, in particular, shows that the projection
mapping PKiinvolved in the definition of the mapping M is continuously differentiable at
si := x∗i − λ∗i for any block component i satisfying strict complementarity.
Lemma 3.4 Let z∗ = (x∗, µ∗, λ∗) be a KKT point of the nonlinear SOCP. Then the fol-
lowing statements hold for each block pair (x∗i , λ∗i ):
(a) If x∗i ∈ intKi and λ∗i = 0, then PKiis continuously differentiable at si := x∗i −λ∗i with
P ′Ki
(si) = I.
(b) If x∗i = 0 and λ∗i ∈ intKi, then PKiis continuously differentiable at si := x∗i −λ∗i with
P ′Ki
(si) = 0.
(c) If x∗i ∈ bd+Ki and λ∗i ∈ bd+Ki, then PKiis continuously differentiable at si := x∗i−λ∗i .
Proof. (a) We have si = x∗i − λ∗i = x∗i ∈ intKi. Hence, locally around si, the projection
mapping PKiis the identity mapping, so that its Jacobian equals the identity matrix.
(b) We have si = x∗i − λ∗i = −λ∗i with λ∗i ∈ intKi. Hence, locally around si, the projection
PKimaps everything onto the zero vector. This implies P ′
Ki(si) = 0.
(c) Write x∗i = (x∗i0, x∗i ), λ
∗i = (λ∗i0, λ
∗i ). By assumption, we have x∗i0 = ‖x∗i ‖ 6= 0 and λ∗i0 =
‖λ∗i ‖ 6= 0. Let si := x∗i−λ∗i and write si = (si0, si). In view of Lemma 2.4, it suffices to show
that si0 6= ±‖si‖. First suppose that si0 = ‖si‖. Then x∗i0−λ∗i0 = ‖x∗i −λ∗i ‖. However, since
x∗i 6= 0 and λ∗i 6= 0, it follows from [1, Lemma 15] that there is a constant α > 0 such that
λ∗i0 = αx∗i0 and λ∗i = −αx∗i . Consequently, we obtain (1−α)x∗i0 = (1+α)‖x∗i ‖ = (1+α)x∗i0,
which implies α = 0, a contradiction to α > 0. In a similar way, one gets a contradiction
14
for the case si0 = −‖si‖. ¤
We are now almost in a position to apply Proposition 3.2 to the Jacobian of the mapping
M at a KKT point z∗ = (x∗, µ∗, λ∗) provided that this KKT point satisfies strict com-
plementarity. This strict complementarity assumption will be removed later, but for the
moment it is quite convenient to assume this condition. For example, it then follows from
Lemma 3.3 that the three index sets
JI :={i∣∣ x∗i ∈ intKi, λ
∗i = 0
},
JB :={i∣∣ x∗i ∈ bd+Ki, λ
∗i ∈ bd+Ki
}, (11)
J0 :={i∣∣ x∗i = 0, λ∗i ∈ intKi
}
form a partition of the block indices i = 1, . . . , r. Here, the subscripts I, B and 0 indicate
whether the block component x∗i belongs to the interior of the cone Ki, or x∗i belongs to
the boundary of Ki (excluding the zero vector), or x∗i is the zero vector.
Let Vi := P ′Ki
(x∗i − λ∗i ). Then Lemma 3.4 implies that
Vi = I ∀i ∈ JI and Vi = 0 ∀i ∈ J0. (12)
To get a similar representation for indices i ∈ JB, we need the spectral decompositions
Vi = QiDiQTi of the matrices Vi. Since strict complementarity holds, it follows from Lemma
2.7 that each Vi has precisely one eigenvalue equal to zero and precisely one eigenvalue equal
to one, whereas all other eigenvalues are strictly between zero and one. Without loss of
generality, we can assume that the eigenvalues of Vi are ordered in such a way that
Di = diag(0, ∗, . . . , ∗, 1) ∀i ∈ JB, (13)
where ∗ denotes the remaining eigenvalues that lie in the open interval (0, 1). Correspond-
ingly we also partition the orthogonal matrices Qi as
Qi =(qi, Qi, q′i
) ∀i ∈ JB, (14)
where qi ∈ Rni denotes the first column of Qi, q′i ∈ Rni is the last column of Qi, and
Qi ∈ Rni×(ni−2) contains the remaining ni − 2 middle columns of Qi. We also use the
following partitionings of the matrices Qi:
Qi =(qi, Qi
)=
(Qi, q
′i
) ∀i ∈ JB, (15)
15
where, again, qi ∈ Rni and q′i ∈ Rni are the first and the last columns of Qi, respectively,
and Qi ∈ Rni×(ni−1) and Qi ∈ Rni×(ni−1) contain the remaining ni − 1 columns of Qi. It is
worth noticing that, by (3), the vectors qi and q′i are actually given by
qi =1
2
(−1
x∗i−λ∗i‖x∗i−λ∗i ‖
)and q′i =
1
2
(1
x∗i−λ∗i‖x∗i−λ∗i ‖
).
(Note that we have x∗i − λ∗i 6= 0 since x∗Ti λ∗i = 0, x∗i ∈ bd+Ki, λ∗i ∈ bd+Ki [1, Lemma 15].)
We are now able to prove the following nonsingularity result under the assumption that
the given KKT point satisfies strict complementarity.
Theorem 3.5 Let z∗ = (x∗, µ∗, λ∗) be a strictly complementary KKT point of the nonlin-
ear SOCP and let the (block) index sets JI , JB, J0 be defined by (11). Assume the following
conditions:
(a) The Hessian ∇2f(x∗) is positive semidefinite on the subspace S :={d ∈ Rn
∣∣Ad = 0,
di = 0 ∀i ∈ J0, qTi di = 0 ∀i ∈ JB
}, and positive definite on the subspace S :=
{d ∈
Rn∣∣Ad = 0, di = 0 ∀i ∈ J0, QT
i di = 0 ∀i ∈ JB
}= S ∩ {
d ∈ Rn∣∣ QT
i di = 0 ∀i ∈ JB
},
where a vector d ∈ Rn is partitioned into d = (d1, . . . , dr)T with block components
di ∈ Rni.
(b) The matrix(Ai(i ∈ JI), AiQi(i ∈ JB)
)has linearly independent rows, where Ai is
the submatrix of A consisting of those columns which correspond to the block index i.
Then the Jacobian M ′(z∗) exists and is nonsingular.
Proof. The existence of the Jacobian M ′(z∗) follows immediately from the assumed strict
complementarity of the given KKT point together with Lemma 3.4. A simple calculation
shows that
M ′(z∗) =
∇2f(x∗) −AT −I
A 0 0
I − V 0 V
for the block diagonal matrix V = diag(V1, . . . , Vr) with Vi = P ′Ki
(x∗i − λ∗i ). Therefore,
taking into account that all eigenvalues of the matrix V belong to the interval [0, 1] by
Lemma 2.7, we are able to apply Proposition 3.2 (with V a := I − V and V b := V ) as soon
as we have identified the index sets α, β, γ ⊆ {1, . . . , n}.For each i ∈ JI , we have Vi = I (see (12)) and, therefore, Qi = I and Di = I. Hence
all components j from the block components i ∈ JI belong to the index set γ.
16
On the other hand, for each i ∈ J0, we have Vi = 0 (see (12)), and this corresponds to
Qi = I and Di = 0. Hence all components j from the block components i ∈ J0 belong to
the index set α.
Finally, let i ∈ JB. Then Vi = QiDiQTi with Di = diag(0, ∗, . . . , ∗, 1) and Qi =
(qi, Qi, q′i). Hence the first component for each block index i ∈ JB is an element of the
index set α, the last component for each block index i ∈ JB belongs to the index set γ,
and all the remaining middle components belong to the index set β.
Taking these considerations into account, we immediately see that our assumptions (a)
and (b) correspond precisely to the assumptions (a) and (b) in Proposition 3.2. ¤
We now want to extend Theorem 3.5 to the case where strict complementarity is violated.
Let z∗ = (x∗, µ∗, λ∗) be an arbitary KKT point of the nonlinear SOCP, and let JI , JB, J0
denote the index sets defined by (11). In view of Lemma 3.3, in addition to these sets, we
also need to consider the three index sets
JB0 :={i∣∣ x∗i ∈ bd+Ki, λ
∗i = 0
},
J0B :={i∣∣ x∗i = 0, λ∗i ∈ bd+Ki
}, (16)
J00 :={i∣∣ x∗i = 0, λ∗i = 0
},
which correspond to the block indices where strict complementarity is violated. Note
that these index sets have double subscripts; the first (resp. second) subscript indicates
whether x∗i (resp. λ∗i ) is on the boundary of Ki (excluding zero) or equal to the zero vector.
The following result summarizes the structure of the matrices Vi ∈ ∂BPKi(x∗i − λ∗i ) for
i ∈ JB0 ∪ J0B ∪ J00. Hence it is a counterpart of Lemma 3.4 in the general case.
Lemma 3.6 Let i ∈ JB0∪J0B∪J00 and Vi ∈ ∂BPKi(x∗i−λ∗i ). Then the following statements
hold:
(a) If i ∈ JB0, then we have either Vi = In+1 or Vi = QiDiQTi with Di = diag(0, 1, . . . , 1)
and Qi = (qi, Qi).
(b) If i ∈ J0B, then we have either Vi = 0 or Vi = QiDiQTi with Di = diag(0, . . . , 0, 1)
and Qi = (Qi, q′i).
(c) If i ∈ J00, then we have Vi = In+1 or Vi = 0 or Vi = QiDiQTi with Di and Qi given
by (13) and (14), respectively, or by Di = diag(0, 1, . . . , 1) and Qi = (qi, Qi), or by
Di = diag(0, . . . , 0, 1) and Qi = (Qi, q′i).
17
Proof. First let i ∈ JB0. Then si := x∗i − λ∗i = x∗i ∈ bd+Ki. Therefore, if we write
si = (si0, si), it follows that si0 = ‖si‖ and si 6= 0. Statement (a) then follows immediately
from Lemma 2.6 (b) in combination with Lemma 2.7.
In a similar way, the other two statements can be derived by using Lemma 2.6 (c) and
(d), respectively, together with Lemma 2.7 in order to get the eigenvalues. Here the five
possible choices in statement (c) depend, in particular, on the value of the scalar w0 in
Lemma 2.6 (d) (namely w0 ∈ (−1, 1), w0 = 1, and w0 = −1). ¤
The previous result enables us to generalize Theorem 3.5 to the case where strict com-
plementarity does not hold. Note that, from now on, we use the spectral decompositions
Vi = QiDiQTi and the associated partitionings (13)–(15) for all i ∈ JB, as well as those
specified in Lemma 3.6 for all indices i ∈ JB0 ∪ J0B ∪ J00.
Theorem 3.7 Let z∗ = (x∗, µ∗, λ∗) be a (not necessarily strictly complementary) KKT
point of the nonlinear SOCP and let the (block) index sets JI , JB, J0, JB0, J0B, J00 be defined
by (11) and (16). Suppose that for any partitioning JB0 = J1B0 ∪ J2
B0, any partitioning
J0B = J10B ∪ J2
0B, and any partitioning J00 = J100 ∪ J2
00 ∪ J300 ∪ J4
00 ∪ J500, the following two
conditions hold:
(a) The Hessian ∇2f(x∗) is positive semidefinite on the subspace
S :={d ∈ Rn
∣∣ Ad = 0,
di = 0 ∀i ∈ J0 ∪ J10B ∪ J2
00,
qTi di = 0 ∀i ∈ JB ∪ J2
B0 ∪ J300 ∪ J4
00,
QTi di = 0 ∀i ∈ J2
0B ∪ J500
},
and positive definite on the subspace
S :={d ∈ Rn
∣∣ Ad = 0,
di = 0 ∀i ∈ J0 ∪ J10B ∪ J2
00,
qTi di = 0 ∀i ∈ J2
B0 ∪ J400,
QTi di = 0 ∀i ∈ JB ∪ J2
0B ∪ J300 ∪ J5
00
}
= S ∩ {d ∈ Rn
∣∣ QTi di = 0 ∀i ∈ JB ∪ J3
00
}.
(b) The matrix
(Ai (i ∈ JI ∪ J1
B0 ∪ J100), AiQi (i ∈ JB ∪ J2
B0 ∪ J300 ∪ J4
00), Aiq′i (i ∈ J2
0B ∪ J500)
)
18
has linearly independent rows.
Then all matrices W ∈ ∂BM(z∗) are nonsingular.
Proof. Choose W ∈ ∂BM(z∗) arbitrarily. Then a simple calculation shows that
W =
∇2f(x∗) −AT −I
A 0 0
I − V 0 V
for a suitable block diagonal matrix V = diag(V1, . . . , Vr) with Vi ∈ ∂BPKi(x∗i − λ∗i ). In
principle, the proof is now similar to the one of Theorem 3.5: We want to apply Proposition
3.2 (with V a := I − V and V b := V ) by identifying the index sets α, β, γ. The situation
is, however, more complicated here, since these index sets may depend on the particular
element W chosen from the B-subdifferential ∂BM(z∗). To this end, we take a closer look
especially at the new index sets JB0, J0B, and J00. In view of Lemma 3.6, we further
partition these index sets into
JB0 = J1B0 ∪ J2
B0,
J0B = J10B ∪ J2
0B,
J00 = J100 ∪ J2
00 ∪ J300 ∪ J4
00 ∪ J500
withJ1
B0 := {i |Vi = In+1}, J2B0 := JB0 \ J1
B0,
J10B := {i |Vi = 0}, J2
0B := J0B \ J10B
and
J100 := {i |Vi = In+1},
J200 := {i |Vi = 0},
J300 := {i |Vi = QiDiQ
Ti with Di and Qi given by (13) and (14), respectively},
J400 := {i |Vi = QiDiQ
Ti with Di = diag(0, 1, . . . , 1) and Qi = (qi, Qi)},
J500 := {i |Vi = QiDiQ
Ti with Di = diag(0, . . . , 0, 1) and Qi = (Qi, q′i)}.
Using these definitions and Lemmas 3.4 and 3.6, we see that the following indices j belong
to the index set α in Proposition 3.2:
• All indices j belonging to one of the block indices i ∈ J0 ∪ J10B ∪ J2
00, with Qi = I
being the corresponding orthogonal matrix.
19
• The first component belonging to a block index i ∈ JB ∪J2B0∪J3
00∪J400, with qi being
the first column of the corresponding orthogonal matrix Qi.
• The first ni − 1 components belonging to each block index i ∈ J20B ∪ J5
00, with Qi
consisting of the first ni − 1 columns of the corresponding orthogonal matrix Qi.
We next consider the index set β in Proposition 3.2. In view of Lemmas 3.4 and 3.6, the
following indices j belong to this set:
• All middle indices belonging to a block index i ∈ JB ∪ J300, with Qi consisting of the
middle ni − 2 columns of the corresponding orthogonal matrix Qi.
Using Lemmas 3.4 and 3.6 again, we finally see that the following indices j belong to the
index set γ in Proposition 3.2:
• All indices j belonging to one of the block indices i ∈ JI∪J1B0∪J1
00. The corresponding
orthogonal matrix is Qi = I.
• The last index of each block index i ∈ JB ∪ J20B ∪ J3
00 ∪ J500, with q′i being the last
column of the corresponding orthogonal matrix Qi.
• The last ni− 1 indices j belonging to a block index i ∈ J2B0 ∪ J4
00, with Qi consisting
of the last ni − 1 columns of the corresponding orthogonal matrix Qi.
From these observations, it follows that the condition QTαd = 0 is equivalent to
di = 0 ∀i ∈ J0 ∪ J10B ∪ J2
00,
qTi di = 0 ∀i ∈ JB ∪ J2
B0 ∪ J300 ∪ J4
00,
QTi di = 0 ∀i ∈ J2
0B ∪ J500.
Furthermore, the requirement QTβ d = 0 can be rewritten as
QTi di = 0 ∀i ∈ JB ∪ J3
00.
Since Qi = (qi, Qi) ∈ Rni×(ni−1), the two conditions QTαd = 0 and QT
β d = 0 together become
di = 0 ∀i ∈ J0 ∪ J10B ∪ J2
00,
qTi di = 0 ∀i ∈ J2
B0 ∪ J400,
QTi di = 0 ∀i ∈ JB ∪ J2
0B ∪ J300 ∪ J5
00.
20
This shows that assumption (a) of this theorem coincides with the corresponding assump-
tion in Proposition 3.2.
In a similar way, we can verify the equivalence between assumption (b) in this theorem
and the corresponding condition in Proposition 3.2. In fact, the above identifications of
the index sets β and γ show that the matrix(AQβ, AQγ
)has linearly independent rows
if and only if the matrix
(AiQi (i ∈ JB∪J3
00), Ai (i ∈ JI∪J1B0∪J1
00), Aiq′i (i ∈ JB∪J2
0B∪J300∪J5
00), AiQi (i ∈ J2B0∪J4
00))
has linearly independent rows. Since Qi = (Qi, q′i) ∈ Rni×(ni−1), it is then easy to see
that this condition is equivalent to our assumption (b). Hence the assertion follows from
Proposition 3.2. ¤
Note that, in the case of a strictly complementary KKT point, Theorem 3.7 reduces to
Theorem 3.5. Using Theorems 3.1 and 3.7 along with [18], we get the following result.
Theorem 3.8 Let z∗ = (x∗, µ∗, λ∗) be a (not necessarily strictly complementary) KKT
point of the nonlinear SOCP, and suppose that the assumptions of Theorem 3.7 hold at this
KKT point. Then the nonsmooth Newton method (5) applied to the system of equations
M(z) = 0 is locally superlinearly convergent. If, in addition, f has a locally Lipschitz
continuous Hessian, then it is locally quadratically convergent.
4 Linear Second-Order Cone Programs
We consider the linear second-order cone program (linear SOCP for short)
min cT x s.t. Ax = b, x ∈ K,
where A ∈ Rm×n, b ∈ Rm, and c ∈ Rn are the given data, and K = K1 × · · · × Kr is the
Cartesian product of second-order cones Ki ⊆ Rni with n1 + · · · + nr = n. Clearly the
linear SOCP is a special case of the nonlinear SOCP discussed in the previous section. In
particular, we may rewrite the corresponding KKT conditions as a system of equations
21
M0(z) = 0, where M0 : Rn × Rm × Rn → Rn × Rm × Rn is defined by
M0(z) := M0(x, µ, λ) :=
c− AT µ− λ
Ax− b
x1 − PK1(x1 − λ1)...
xr − PKr(xr − λr)
. (17)
An immediate consequence of Theorem 3.1 is the fact that the mapping M0 is strongly
semismooth. On the other hand, Theorems 3.5 and 3.7 do not apply to the linear SOCP,
since the second-order conditions do not hold. We therefore need another suitable condition
that guarantees the required nonsingularity of the (generalized) Jacobian of M0 at a KKT
point z∗ = (x∗, µ∗, λ∗). Before presenting such a result, we follow the presentation of the
previous section and first state a general proposition.
Proposition 4.1 Let A ∈ Rm×n. Let V a, V b ∈ Rn×n be two symmetric positive semidef-
inite matrices such that their sum V a + V b is positive definite and V a and V b have a
common basis of eigenvectors, so that there exist an orthogonal matrix Q ∈ Rn×n and
diagonal matrices Da = diag(a1, . . . , an
), Db = diag
(b1, . . . , bn
)satisfying V a = QDaQT ,
V b = QDbQT and aj ≥ 0, bj ≥ 0, aj + bj > 0 for all j = 1, . . . , n. Let the index set
{1, . . . , n} be partitioned as {1, . . . , n} = α ∪ β ∪ γ with
α := {j | aj > 0, bj = 0},β := {j | aj > 0, bj > 0},γ := {j | aj = 0, bj > 0}.
Assume that the following two conditions hold:
(a) The matrix AQγ has full column rank.
(b) The matrix(AQβ, AQγ
)has full row rank.
Then the matrix
W0 :=
0 −AT −I
A 0 0
V a 0 V b
is nonsingular.
22
Proof. Let y =(y(1), y(2), y(3)
) ∈ Rn×Rm×Rn be an arbitrary vector such that W0y = 0.
Then
AT y(2) + y(3) = 0, (18)
Ay(1) = 0, (19)
V ay(1) + V by(3) = 0. (20)
Using the spectral decompositions of V a and V b, equation (20) can be rewritten as
Day(1) + Dby(3) = 0 with y(1) := QT y(1), y(3) := QT y(3).
Taking into account the definitions of the index sets α, β, and γ, we obtain
y(1)α = 0, y(3)
γ = 0, and Daβ y
(1)β + Db
β y(3)β = 0. (21)
Premultiplying (18) by QT , using the definitions of y(1), y(3), and writing down the resulting
equations componentwise for each block corresponding to the indices in α, β, and γ, we get
QTαAT y(2) + y(3)
α = 0, (22)
QTβ AT y(2) + y
(3)β = 0, (23)
QTγ AT y(2) = 0. (24)
Furthermore, equation (19) can be written as
0 = Ay(1) = AQQT y(1) = AQy(1) = AQβ y(1)β + AQγ y
(1)γ . (25)
Using the nonsingularity of the diagonal submatrices Daβ, Db
β, we obtain from (21) and (23)
that
y(1)β = −(Da
β)−1Dbβ y
(3)β = (Da
β)−1DbβQT
β AT y(2).
Substituting this expression for q(1)β in (25) yields
AQβ(Daβ)−1Db
βQTβ AT y(2) + AQγ y
(1)γ = 0.
Premultiplying by (y(2))T and using (24), we have
(y(2))T AQβ(Daβ)−1Db
βQTβ AT y(2) = 0.
However, by the definition of the index set β, the matrix (Daβ)−1Db
β is positive definite.
Consequently, we obtain
QTβ AT y(2) = 0. (26)
23
Then (23) implies y(3)β = 0, and (21) yield y
(1)β = 0. This, in turn, gives AQγ y
(1)γ = 0
by (25). Assumption (a) then gives y(1)γ = 0. Furthermore, (26) and (24) together with
assumption (b) yields y(2) = 0, which also implies y(3)α = 0 by (22). Thus we have y(1) = 0
and y(3) = 0. Then it is easy to deduce that y =(y(1), y(2), y(3)
)= (0, 0, 0), i.e., W0 is
nonsingular. ¤
We now come back to the linear SOCP. First consider the case where z∗ = (x∗, µ∗, λ∗) is a
strictly complementary KKT point, and use the index sets JI , JB, and J0 defined by (11)
in order to partition any given vector in Rn in a suitable way. Let Vi := P ′Ki
(x∗i − λ∗i ).
Similarly to the discussion in the previous section, we have
Vi = I ∀i ∈ JI and Vi = 0 ∀i ∈ J0. (27)
Moreover, for each i ∈ JB, let Vi = QiDiQTi be the spectral decomposition of the matrix
Vi. For the matrices Di and Qi, i ∈ JB, we use the same decomposition as in (13)–(15).
Using this notation, we are now in a position to state the following nonsingularity
result which may be viewed as a counterpart of the corresponding result for interior-point
methods given in [1, Theorem 28]. Recall that Ai denotes the submatrix of A consisting
of those columns which correspond to the block index i.
Theorem 4.2 Let z∗ = (x∗, µ∗, λ∗) be a strictly complementary KKT point of the linear
SOCP. Suppose further that the following conditions hold:
(a) The matrix(Ai (i ∈ JI), Aiq
′i (i ∈ JB)
)has linearly independent columns.
(b) The matrix(Ai (i ∈ JI), AiQi (i ∈ JB)
)has linearly independent rows.
Then the Jacobian M ′0(z
∗) exists and is nonsingular.
Proof. We have
M ′0(z
∗) =
0 −AT −I
A 0 0
I − V 0 V
for a block diagonal matrix V = diag(V1, . . . , Vr
)with Vi := P ′
Ki(x∗i −λ∗i ). Let V a := I−V
and V b := V . It then follows that V a, V b are both symmetric positive semidefinite with
a positive definite sum V a + V b, and these two matrices obviously have a common basis
of eigenvectors. Hence we are in the situation of Proposition 4.1. In order to apply this
24
result, we need to take a closer look at the orthogonal matrix and the special structure of
the eigenvalues in the case that is currently under consideration.
We have
V a = Q(I −D)QT and V b = QDQT
with the block matrices
Q := diag(Q1, . . . , Qr
)and D := diag
(D1, . . . , Dr
),
where the orthogonal blocks Qi are identity matrices for all indices i ∈ JI ∪ J0, cf. (27),
whereas for all i ∈ JB, these orthogonal matrices are given by (14), or (15), and the
corresponding diagonal matrices Di are given by (13).
In order to apply Proposition 4.1, we need to identify the index sets β and γ. From
(27) and (13), we see that the following indices comprise the index set β:
• all middle indices j belonging to a block index i ∈ JB, with Qi consisting of the
middle ni − 2 columns of the corresponding orthogonal matrix Qi.
Similarly, we see that the following indices comprise the index set γ:
• all indices j belonging to a block index i ∈ JI , with Qi = I being the corresponding
orthogonal matrix.
• the last component j of each block index i ∈ JB, with q′i being the last column of the
corresponding orthogonal matrix Qi.
Using these identifications, it is easy to see that assumptions (a) and (b) in Proposition
4.1 are identical to the two conditions in this theorem. Consequently the assertion follows
from Proposition 4.1. ¤
We next generalize Theorem 4.2 to the case where strict complementarity is not satis-
fied. In addition to JI , JB, J0, we use the index sets JB0, J0B, J00 defined by (16).
Theorem 4.3 Let z∗ = (x∗, µ∗, λ∗) be a (not necessarily strictly complementary) KKT
point of the linear SOCP. Suppose that for any partitioning JB0 = J1B0 ∪ J2
B0, any subset
J20B ⊆ J0B, and any mutually disjoint subsets J1
00, J300, J
400, J
500 ⊆ J00, the following two
conditions hold:
25
(a) The matrix
(Ai (i ∈ JI ∪ J1
B0 ∪ J100), AQi (i ∈ J2
B0 ∪ J400), Aiq
′i (i ∈ JB ∪ J2
0B ∪ J300 ∪ J5
00))
has linearly independent columns.
(b) The matrix
(Ai (i ∈ JI ∪ J1
B0 ∪ J100), AiQi (i ∈ JB ∪ J2
B0 ∪ J300 ∪ J4
00), Aiq′i (i ∈ J2
0B ∪ J500)
)
has linearly independent rows.
Then all matrices W0 ∈ ∂BM0(z∗) are nonsingular.
Proof. Choose W0 ∈ ∂BM0(z∗) arbitrarily. Then
W0 =
0 −AT −I
A 0 0
I − V 0 V
for a block diagonal matrix V = diag(V1, . . . , Vr
)with Vi ∈ ∂BPKi
(x∗i − λ∗i ). Following the
proof of Theorem 4.2, we define V a := I −V and V b := V and see that these two matrices
satisfy the assumptions of Proposition 4.1. Similarly to the proof of Theorem 4.2, we also
have
V a = Q(I −D)QT and V b = QDQT
with the block matrices
Q := diag(Q1, . . . , Qr
)and D := diag
(D1, . . . , Dr
).
Then we need to identify the index sets α, β and γ in Proposition 4.1. To this end, we
follow the proof of Theorem 3.7 and partition the index sets JB0, J0B, and J00 further into
certain subsets in a similar manner. Then, as in the proof of Theorem 3.7, we are able
to show which indices j belong to the sets α, β, and γ. Moreover, it is not difficult to
see that the two assumptions (a) and (b) in this theorem correspond precisely to the two
requirements (a) and (b) in Proposition 4.1. We leave the details to the reader (note that
assumption (b) coincides with the corresponding assumption in Theorem 3.7, so there is
nothing to verify for this part). ¤
Note that a corresponding result for interior-point methods does not hold since the Jacobian
matrices arising in that context are singular whenever strict complementarity does not hold.
As a consequence of Theorem 4.3 and [19], we obtain the following theorem.
26
Theorem 4.4 Let z∗ = (x∗, µ∗, λ∗) be a (not necessarily strictly complementary) KKT
point of the linear SOCP, and suppose that the assumptions of Theorem 4.3 hold at this
KKT point. Then the nonsmooth Newton method zk+1 := zk − W−1k M0(z
k), with Wk ∈∂BM0(z
k), applied to the system of equations M0(z) = 0 is locally quadratically convergent.
5 Final Remarks
We have investigated the local properties of a semismooth equation reformulation of both
the linear and the nonlinear second-order cone programs. In particular, we have shown
nonsingularity results that provide basic conditions for local quadratic convergence of a
nonsmooth Newton method. Strict complementarity of a solution is not needed in our
nonsingularity results. Apart from these local properties, it is certainly of interest to see
how the local Newton method can be globalized in a suitable way. We leave it as a future
research topic.
References
[1] F. Alizadeh and D. Goldfarb: Second-order cone programming. Mathematical
Programming 95, 2003, pp. 3–51.
[2] A. Ben-Tal and A. Nemirovski: Lectures on Modern Convex Optimization. MPS-
SIAM Series on Optimization, SIAM, Philadelphia, PA, 2001.
[3] J.F. Bonnans and H. Ramırez C.: Perturbation analysis of second-order cone
programming problems. Mathematical Programming 104, 2005, pp. 205–227.
[4] S. Boyd and L. Vandenberghe: Convex Optimization. Cambridge University
Press, Cambridge, 2004.
[5] X.D. Chen, D. Chen, and J. Sun: Complementarity functions and numerical
experiments for second-order cone complementarity problems. Computational Opti-
mization and Applications 25, 2003, pp. 39–56.
[6] J.-S. Chen: Alternative proofs for some results of vector-valued functions associated
with second-order cones. Journal of Nonlinear and Convex Analysis 6, 2005, pp. 297–
325.
27
[7] J.-S. Chen, X. Chen, and P. Tseng: Analysis of nonsmooth vector-valued func-
tions associated with second-order cones. Mathematical Programming 101, 2004, pp.
95–117.
[8] J.-S. Chen and P. Tseng: An unconstrained smooth minimization reformulation
of the second-order cone complementarity problem. Mathematical Programming 104,
2005, pp. 293–327.
[9] F.H. Clarke: Optimization and Nonsmooth Analysis. John Wiley & Sons, New
York, NY, 1983 (reprinted by SIAM, Philadelphia, PA, 1990).
[10] F. Facchinei and J.S. Pang: Finite-Dimensional Variational Inequalities and
Complementarity Problems, Volume II. Springer, New York, NY, 2003.
[11] M. Fukushima, Z.-Q. Luo, and P. Tseng: Smoothing functions for second-order-
cone complementarity problems. SIAM Journal on Optimization 12, 2001, pp. 436–460.
[12] S. Hayashi, N. Yamashita, and M. Fukushima: A combined smoothing and reg-
ularization method for monotone second-order cone complementarity problems. SIAM
Journal on Optimization 15, 2005, pp. 593–615.
[13] C. Helmberg: http://www-user.tu-chemnitz.de/~helmberg/semidef.html.
[14] H. Kato and M. Fukushima: An SQP-type algorithm for nonlinear second-order
cone programs. Technical Report 2006-004, Department of Applied Mathematics and
Physics, Kyoto University, March 2006.
[15] M.S. Lobo, L. Vandenberghe, and S. Boyd: SOCP – Software for second-
order cone programming. User’s guide. Technical Report, Department of Electrical
Engineering, Stanford University, April 1997.
[16] M.S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret: Applications of
second-order cone programming. Linear Algebra and Its Applications 284, 1998, pp.
193–228.
[17] J.-S. Pang and L. Qi: Nonsmooth equations: motivation and algorithms. SIAM
Journal on Optimization 3, 1993, pp. 443–465.
[18] L. Qi: Convergence analysis of some algorithms for solving nonsmooth equations.
Mathematics of Operations Research 18, 1993, pp. 227–244.
28
[19] L. Qi and J. Sun: A nonsmooth version of Newton’s method. Mathematical Pro-
gramming 58, 1993, pp. 353–367.
[20] J.F. Sturm: Using SeDuMi 1.02, a MATLAB toolbox for optimization over
symmetric cones. Optimization Methods and Software 11/12, 1999, pp. 625–653.
http://sedumi.mcmaster.ca/
[21] D. Sun and J. Sun: Strong semismoothness of the Fischer-Burmeister SDC and
SOC complementarity functions. Mathematical Programming 103, 2005, pp. 575–581.
[22] P. Tseng: Smoothing methods for second-order cone programs/complementarity
problems. Talk presented at the SIAM Conference on Optimization, Stockholm, May
2005.
[23] R.H. Tutuncu, K.C. Toh and M.J. Todd: Solving semidefinite-quadratic-
linear programs using SDPT3. Mathematical Programming 95, 2003, pp. 189–217.
http://www.math.nus.edu.sg/ mattohkc/sdpt3.html
[24] R.J. Vanderbei and H. Yuritan: Using LOQO to solve second-order cone pro-
gramming problems. Technical Report, Statistics and Operations Research, Princeton
University, 1998.
[25] H. Yamashita and H. Yabe: A primal-dual interior point method for nonlinear
optimization over second order cones. Technical Report, Mathematical Systems Inc.,
Tokyo, May 2005 (revised February 2006).
29