A globally convergent primal-dual
interior-point three-dimensional filter
method for nonlinear semidefinite
programming
Zhongyi Liu∗
Abstract
This paper proposes a primal-dual interior-point filter methodfor nonlinear semidefinite programming, which is the first multi-dimensional (three-dimensional) filter methods for interior-pointmethods, and of course for constrained optimization. A freshlynew definition of filter entries is proposed, which is greatly dif-ferent from those in all the current filter methods. A mixednorm is used to tackle with trust region constraints and globalconvergence to first-order critical points can easily be proved byslightly modifying the analysis in [7].
Keywords: nonlinear semidefinite programming, interior-point method,
three-dimensional filter method, global convergence
AMS subject classification: 65K05, 90C51
1 Introduction
For the concepts and frame of filter methods, we refer to [1, 2, 3, 4] and specially
to [7]. The multidimensional filter methods [5, 6] can only tackle with uncon-
strained optimization. As mentioned in [7], that multidimensional filter methods
∗Correspondence to: College of Science, Hohai University, Nanjing 210098, China.E-mail: [email protected]
1
for constrained optimization and interior-point methods with three-dimensional
filter is still an open question till now. In the current paper we propose a three-
dimensional filter method for interior-point methods. A freshly new definition
of filter entries in the filter method is proposed for the first time, in which the
first component is always kept smaller than the second one, which looks greatly
different from those in all the current filter methods. This idea is simple but
valuable for it is possible to open access to multidimensional filter methods for
constrained optimization.
We consider the following nonlinear SDP problem:
min f(x), x ∈ Rn,
s.t. h(x) = 0, X ≡ Ax−B � 0, (1.1)
where the functions f : Rn → R and h : Rn → Rm are sufficiently smooth. Here
A is a linear operator defined by Ax =∑n
i=1 xiAi for x ∈ Rn, and Ai ∈ Sp, i =
1, . . . , n, and B ∈ Sp are given matrices, where Sp denotes the set of pth order
real symmetric matrices. By X � 0 and X � 0, we mean that the matrix X is
positive semidefinite and positive definite, respectively.
Throughout this paper, we define the inner product 〈X,Z〉 by 〈X, Z〉 =
Tr(XZ) for any matrices X and Z in Sp, where Tr(M) denotes the trace of
the matrix M . We also define an adjoint operator A∗ of A such that A∗Z is an
n dimensional vector whose ith element is Tr(AiZ). Then we have
〈Ax, Z〉 = xT (A∗Z) = (A∗Z)T x,
where the superscript T denotes the transpose of a vector or a matrix. And the
mixed norm ‖·‖∗ expresses ‖u‖+‖ V ‖F for some form (u, V ), where u is a vector,
V is a matrix, ‖ · ‖ is the Euclidean norm and ‖ · ‖F is the Frobenius norm.
2 Interior-point framework
We return to the nonlinear semidefinite programming problem posed in (1.1).
2.1 Step computation
Let the Lagrangian function of problem (1.1) be defined by
L(x, y, Z) = f(x)− yT h(x)− 〈X,Z〉,
2
where y ∈ Rm and Z ∈ Sp are the Lagrange multiplier vector and matrix which
correspond to the equality and positive semidefiniteness constraints, respectively.
Then Karush-Kuhn-Tucker (KKT) conditions for optimality of problem (1.1) are
given by ∇xL(x, y, Z)
h(x)
X ◦ Z
= 0
and
X � 0, Z � 0.
Here ∇xL(x, y, Z) is given by
∇xL(x, y, Z) = ∇f(x)−∇h(x)y −A∗Z,
and the multiplication X ◦ Z is defined by
X ◦ Z =XZ + ZX
2.
It is known that X ◦ Z = 0 is equivalent to the relation XZ = ZX = 0.
We call (x, y, Z) satisfying X � 0 and Z � 0 the interior point. To construct
an interior-point algorithm, we introduce a positive parameter µ, and we replace
the complementarity condition X ◦ Z = 0 by X ◦ Z = µI, where I denotes the
identity matrix and µ > 0. Then we try find a point that satisfies the following
equation:
F (x, y, Z) :=
∇xL(x, y, Z)
h(x)
X ◦ Z − µI
= 0 (2.1)
and
X � 0, Z � 0.
Throughout we will work with µ = σµ, where σ ∈ (0, 1) is a centering parameter,
and
µ =Tr(XZ)
n.
To abbreviate notation we set
w = (x, y, Z) and ∆w = (∆x, ∆y, ∆Z).
3
We apply a Newton-like method to the system of equations (2.1). Let the Newton
directions for the primal-dual variables by ∆x ∈ Rn and ∆Z ∈ Sp, respectively.
We define ∆X =∑n
i=1 ∆xiAi and we note that ∆X ∈ Sp.
Since (X + ∆X) ◦ (Z + ∆Z) = σµI can be written as
(X + ∆X)(Z + ∆Z) + (Z + ∆Z)(X + ∆X) = 2σµI,
neglecting the nonlinear parts ∆X∆Z and ∆Z∆X implies the Newton equation
for (2.1) as
G∆x−∇h(x)∆y −A∗∆Z = −∇xL(x, y, Z),
∇h(x)T ∆x = −h(x),
∆XZ + Z∆X + X∆Z + ∆ZX = 2σµI −XZ − ZX,
where G denotes the Hessian matrix of the Lagrangian function L(w) or its ap-
proximation.
We rewrite the perturbed KKT-conditions in the form 0
h(x)
X ◦ Z − µI
+
∇xL(x, y, Z)
0
(1− σ)µI
= 0. (2.2)
Now we use the decomposition associated with the splitting (2.2) to define tangen-
tial and normal components of trial steps. For the normal step sn = (∆xn, ∆yn, ∆Zn),
we choose G∆xn −∇h(x)∆yn −A∗∆Zn
∇h(x)T ∆xn
X ◦∆Zn + Z ◦∆Xn
= −
0
h(x)
X ◦ Z − µI
, (2.3)
whereas the tangential step st = (∆xt, ∆yt, ∆Zt) is given by G∆xt −∇h(x)∆yt −A∗∆Zt
∇h(x)T ∆xt
X ◦∆Zt + Z ◦∆X t
= −
∇xL(w)
0
(1− σ)µI
. (2.4)
Thus ∆w = sn +st. We will introduce different stepsizes for sn and st in our trial
step computation.
We introduce ∆ as the positive scalar that primarily controls the length of
the step taken along ∆w, forcing the damped components αn(∆)sn and αt(∆)st,
to satisfy
‖αn(∆)sn‖∗ ≤ ∆, ‖αt(∆)st‖∗ ≤ ∆.
4
Having these bounds in mind, and requiring explicitly αt(∆) ≤ αn(∆), we set
αn(∆) = min
{1,
∆
‖sn‖∗
}, (2.5)
αt(∆) = min
{αn(∆),
∆
‖sn‖∗
}= min
{1,
∆
‖sn‖∗,
∆
‖st‖∗
}. (2.6)
Hereby, we use for ∆ > 0 the natural definition αn(∆) = 1 for ‖sn‖∗ = 0 and
αt(∆) = αn(∆) for ‖st‖∗ = 0 by using the convention min{1,∞} = 1.
Let also
w(∆) = (x(∆), y(∆), Z(∆)) = w + αn(∆)sn + αt(∆)st,
s(∆) = (sx(∆), sy(∆), sZ(∆)) = w(∆)− w = αn(∆)sn + αt(∆)st.
Thus,
‖s(∆)‖∗ ≤ 2∆,
and here ∆ plays a role comparable to the trust-region radius.
The scalars αn(∆) and αt(∆) define the step sizes taken along the normal
and tangential directions, respectively, and will be such that positivity and some
measure of centrality of the new iterate w(∆) are maintained. However, both
αn(∆) and αt(∆) depend on ∆, that in turn will be adjusted, not only to meet
the purpose of positivity and centrality, but also to enforce global convergence.
We introduce the notations
θh(w) := ‖h(x)‖, θc(w) :=
∥∥∥∥X ◦ Z − Tr(XZ)
nI
∥∥∥∥F, θ(w) := θh(w) + θc(w),
θL(w) := ‖∇xL(w)‖, θg(w) :=Tr(XZ)
n+ ‖∇xL(w)‖2.
In particular, we write the filter components as
θ1(w) := min{θh(w), θc(w)}, θ2(w) := max{θh(w), θc(w)},
θ3(w) := θg(w) :=Tr(XZ)
n+ ‖∇xL(w)‖2.
Since X ◦ Z might not be zero matrix, a point w that satisfies θc(w) = θh(w) =
θL(w) = 0 and X � 0, Z � 0, might not be a KKT point. The definition of
θg(w), however, guarantees that a point w verifying θc(w) = θh(w) = θg(w) = 0,
which is the same as θ1(w) = θ2(w) = θ3(w) = 0, and X � 0, Z � 0, indeed a
KKT point.
5
With the purpose of achieving a reduction on the function θg, we introduce,
at a given point w, the quadratic model
m(w(∆))
=Tr(XZ)
n+
Tr((X(∆)−X)Z) + Tr((Z(∆)− Z)X)
n+‖∇xL(w) +∇2
xwL(w)(w(∆)− w)‖2
=Tr(X(∆)Z(∆))− Tr((X(∆)−X)(Z(∆)− Z))
n+‖∇xL(w) +∇2
xwL(w)(w(∆)− w)‖2,
by adding to the linearization of Tr(XZ)/n the squared norm of the linearization
of ∇xL(w). To shorten notation we also set
µ(∆) =Tr(X(∆)Z(∆))
n.
In order to prevent (X(∆), Z(∆)) from approaching the boundary of the positive
cone too rapidly we will keep the iteration in the neighborhood
N (γ, M) =
{w : X � 0, Z � 0, X ◦ Z � γ
Tr(XZ)
n, θh(w) + θL(w) ≤ M
Tr(XZ)
n
}with fixed γ ∈ (0, 1) and M > 0. We will set in the next subsection that
w ∈ N (γ, M) implies w(∆) ∈ N (γ, M) whenever ∆ ∈ (0, ∆min] for a given
constant ∆min > 0.
2.2 Step estimates
Lemma 2.1. Let A, B ∈ Sn×n, λi(A) be its eigenvalue, Λ = diag(λ(A)), and
ρ(A) be its spectral radius, i.e., ρ(A) = max{|λi(A)|, i = 1, . . . , n}. Then
A ◦B � ρ(A)ρ(B)I
Proof. Since
2ρ(A)ρ(B)I − (QT AQQT BQ + QT BQQT AQ)
= ρ(A)ρ(B)I − ΛQT BQ + ρ(A)ρ(B)I −QT BQΛ
� Λρ(B)I − ΛQT BQ + ρ(B)IΛ−QT BQΛ
= Λ(ρ(B)I −QT BQ) + (ρ(B)I −QT BQ)Λ
� 0,
6
then the lemma follows easily.
Lemma 2.2. For all ∆ > 0 and all i = 1, . . . , n it holds
λi(X(∆) ◦ Z(∆)) ≤ (1− αn(∆))λi(X ◦ Z)
+(αn(∆)− αt(∆)(1− σ))µ + 4∆2, (2.7)
λi(X(∆) ◦ Z(∆)) ≥ (1− αn(∆))λi(X ◦ Z)
+(αn(∆)− αt(∆)(1− σ))µ− 4∆2, (2.8)
µ(∆) ≤ (1− αt(∆)(1− σ))µ + 4∆2, (2.9)
µ(∆) ≥ (1− αt(∆)(1− σ))µ− 4∆2. (2.10)
Proof. By the definition of sn and st, we have
X(∆) ◦ Z(∆) = (X + αn(∆)∆Xn + αt(∆)∆X t) ◦ (Z + αn(∆)∆Zn + αt(∆)∆Zt)
= X ◦ Z + αn(∆)(∆Xn ◦ Z + X ◦∆Zn)
+αt(∆)(∆X t ◦ Z + X ◦∆Zt)
+(αn(∆)∆Xn + αt(∆)∆X t) ◦ (αn(∆)∆Zn + αt(∆)∆Zt)
= X ◦ Z − αn(∆)(X ◦ Z − µI)− αt(∆)(1− σ)µ
+(X(∆)−X) ◦ (Z(∆)− Z).
Note that |λi(X(∆)−X)| ≤ ‖X(∆)−X‖F , similarly for |λi(Z(∆)− Z)|. Then
|λi(X(∆)−X)||λi(Z(∆)− Z)| ≤ (2∆)2.
Hence the relations (2.7) and (2.8) follow.
Summing (2.7) and (2.8) over all i, dividing the result by n, and using µ =
Tr(XZ)/n, µ(∆) = Tr(X(∆)Z(∆))/n, yield (2.9) and (2.10).
The following lemma provides, at w(∆), upper bounds on the values of θh, θc, θL
and θg in terms of ∆ and of the corresponding values at w. It also provides a
lower bound for the decrease produced on the quadratic model m by the step
w(∆)− w.
Lemma 2.3. There exist positive constants Mh, Mc, and ML, depending on an
upper bound for θL and on the Lipschitz constants of ∇h and ∇2xwL, such that,
for all ∆ > 0,
θh(w(∆)) ≤ (1− αn)θh(w) + Mh∆2, (2.11)
θc(w(∆)) ≤ (1− αn(∆))θc(w) + Mc∆2, (2.12)
θL(w(∆)) ≤ (1− αt(∆))θL(w) + ML∆2. (2.13)
7
For ∆ satisfying 0 < ∆ ≤ ∆ub, it also holds
θg(w(∆)) ≤ (1− αt(∆)(1− σ))θg(w) + Mg∆2, (2.14)
for some positive constant Mg.
Finally, for any ∆ > 0, we also have
m(w)−m(w(∆)) ≤ αt(∆)(1− σ)θg(w). (2.15)
Proof. Denote by BL an upper bound for θL, and by Ch and CL′ > 1 Lipschitz
constants for ∇h and ∇2xwL, respectively. We will prove this Lemma with
Mh = 2Ch, Mc = 8√
n,
ML = 2CL′ , Mg = 4(1 + BLCL′) + 4C2L′∆2
ub.
Note that ∇h(x)T sx(∆) = −αn(∆)h(x) from (2.3),(2.4) and the definition of
s(∆). Thus
θh(w(∆)) = ‖h(x(∆))‖ = ‖h(x) +
∫ 1
0
∇h(x + tsx(∆))T sx(∆)dt‖
= ‖(1− αn(∆))h(x) +
∫ 1
0
(∇h(x + tsx(∆))−∇h(x))T sx(∆)dt‖
≤ (1− αn(∆))θh(w) + Ch‖sx(∆)‖2∗
∫ 1
0
tdt,
which proves (2.11).
Similarly, we have ∇2xwL(w)T s(∆) = −αt(∆)∇xL(w) from (2.3), (2.4) and
the definition of s(∆), as above, we get
θL(w(∆)) ≤ (1− αt(∆))θL(w) +
∫ 1
0
‖∇2xwL(w + ts(∆))−∇2
xwL(w)‖‖s(∆)‖∗dt,
which yields (2.13).
From Lemma 2.2,
±λi(X(∆) ◦ Z(∆)− µ(∆)I) ≤ ±((1− αn(∆))λi(X ◦ Z) + (αn(∆)− αt(∆)(1− σ))µ
)+ 4∆2
∓(1− αt(∆)(1− σ))µ + 4∆2
= ±(1− αn(∆))(λi(X ◦ Z)− µ) + 8∆2.
Then (2.12) follows.
From Lemma 2.2 and the inequality (2.13),
θg(w(∆)) = µ(∆) + θL(w(∆))2
≤ (1− αt(∆)(1− σ))µ + 4∆2 + ((1− αt(∆))θL(w) + 2CL′∆2)2
≤ (1− αt(∆)(1− σ))θg(w) + (4 + 4(1− αt(∆))θL(w))CL′∆2 + 4C2L′∆4,
8
where the second inequality take advantage of the relation (1 − αt(∆))2 ≤ 1 −αt(∆)(1− σ). Then (2.14) follows.
Finally, we have
m(w)−m(w(∆)) = (Tr(XZ)
n+ ‖∇xL(w)‖2)
−(Tr(X(∆)Z(∆))
n− Tr((X(∆)−X)(Z(∆)− Z))
n+‖∇xL(w) +∇2
xwL(w)(w(∆)− w)‖2)
= µ− µ(∆) +Tr((X(∆)−X)(Z(∆)− Z))
n+(1− (1− α(∆))2)‖∇xL(w)‖2
= αt(∆)(1− σ)µ + (1− (1− α(∆))2)‖∇xL(w)‖2
≥ αt(1− σ)θg(w),
where the second equality is due to the relation ∇2xwL(w)s(∆) = −αt∆∇xL(w)
and the inequality is due to 1 − (1 − αt(∆))2 ≥ (1 − σ)αt(∆). The proof is
therefore completed.
By introducing E = µ(X−1 � X−1) and F as the identity, we see that the
Newton system of equations (2.1) can be transformed into ∇xxL(x, y, Z) −∇h(x) −A∗
∇h(x)T 0 0
EA 0 F
∆x
∆y
∆Z
= −
∇xL(x, y, Z)
h(x)
X ◦ Z − µI
,
that is,
F ′(x, y, Z)∆w = −F (x, y, Z).
Now we state a result that says that if the current point w = (x, y, Z) satis-
fies the centrality requirement X ◦ Z ≥ γµI, so does the next point w(∆) =
(x(∆), y(∆), Z(∆)), provided ∆ is sufficiently small. A similar property is also
stated for the inequalities θh(w) + θL(w) ≤ Mµ and (X, Z) � 0.
Lemma 2.4. Let ‖F ′(w)−1‖ ≤ C and, for γ ∈ (0, 1) and M > 0, assume that
X ◦ Z ≥ γµI, θh(w) + θL(w) ≤ Mµ.
Then there exists a constant ∆min such that, if 0 < ∆ ≤ ∆min, then
X(∆) ◦ Z(∆) ≥ γµ(∆)I, (2.16)
θh(w(∆)) + θL(w(∆)) ≤ Mµ(∆). (2.17)
9
Furthermore, if (X, Z) � 0, then, for all ∆ ∈ (0, ∆min],
(X(∆), Z(∆)) � 0. (2.18)
Thus, w ∈ N (γ, M) implies w(∆) ∈ N (γ, M) for all ∆ ∈ (0, ∆min].
Proof. We will prove this Lemma with
∆min = { σ(1− γ)
4(1 + γ)C(M + n),
σM
(Mh + ML + 4M)C(M + n)}. (2.19)
1. We first show that (2.16) holds for all ∆ ∈ (0, ∆min] with ∆min satisfying
(2.19). The proof is divided into two stages, first we show (2.16) holds whenever
(2.25) holds, then we show (2.25) holds for all ∆ ∈ (0, ∆min].
From Lemma 2.2 we obtain
λi(X(∆) ◦ Z(∆)) ≥ (γ + (1− γ)αn(∆)− αt(∆)(1− σ))µ− 4∆2. (2.20)
On the other hand, Lemma 2.2 also yields
γµ(∆) ≤ γµ− γαt(∆)(1− σ)µ + 4γ∆2. (2.21)
Hence, λi(X(∆) ◦ Z(∆)) ≥ γµ(∆) holds whenever
4∆2 ≤ (1− γ)(αn(∆)− αt(∆)(1− σ))µ
1 + γ. (2.22)
Since αt(∆) ≤ αn(∆), a sufficient condition for this inequality to hold is
∆2 ≤ αn(∆)σ(1− γ)µ
4(1 + γ),
which, by (2.5), is implied by
∆ ≤ min
{√σ(1− γ)µ
4(1 + γ),
σ(1− γ)µ
4(1 + γ)‖sn‖∗
},
which in turn is true if
∆ ≤ δ1(µ) := min
{√σ(1− γ)µ
4(1 + γ),
σ(1− γ)
4(1 + γ)C(M + n)
}, (2.23)
10
since
‖X ◦ Z − µI‖2F =
n∑i=1
λ2i (X ◦ Z)−
n∑i=1
2µλi(X ◦ Z) + nµ2
=n∑
i=1
λ2i (X ◦ Z)− nµ2
≤ (n∑
i=1
λi(X ◦ Z))2 − nµ2
= (n2 − n)µ2,
and then
‖sn‖∗ = ‖(∆xn, ∆yn, ∆Zn)‖∗≤ C(‖h(x)‖+ ‖X ◦ Z − µI‖F)
= C(θh(w) + ‖X ◦ Z − µI‖F)
≤ C(M + (n2 − n)1/2)µ
≤ C(M + n)µ.
On the other hand, we can also deduce that ‖st‖∗ ≤ C(M + n1/2)µ ≤ C(M +
n)µ, and therefore
δ := max{‖sn‖∗, ‖st‖∗} ≤ C(M + n)µ. (2.24)
We consider now two possible cases in order to show that (2.16) holds whenever
min(∆, δ) ≤ δ1(µ). (2.25)
In the first case ∆ ≤ δ we use that, as we have just shown, (2.16) holds
provided ∆ ≤ δ1(µ). Since ∆ ≤ δ, the inequality ∆ ≤ δ1(µ) is equivalent to
(2.25).
In the case δ < ∆, we know that αn(∆) = αt(∆) = 1 by (2.5) and (2.6), note
that
αn(δ) = min
{1,
δ
‖sn‖∗
}= 1,
αt(δ) = min
{αn(δ),
δ
‖st‖∗
}= 1,
then w(∆) = w(δ), and s(∆) = s(δ). Thus, (2.16) is the same as X(δ) ◦ Z(δ) ≥γµ(δ)I. One can apply the same algebraic arguments used in the first paragraph
11
of the proof to show that the result similar to (2.23), that is, X(δ)◦Z(δ) ≥ γµ(δ)
holds if δ ≤ δ1(µ). Now, since δ ≤ ∆, we have that δ ≤ δ1(µ) is also equivalent
to (2.25). The first stage is completed.
Hence, it remains to show (2.25) holds for ∆ ∈ (0, ∆min] with ∆min from (2.19).
Let
µ1crit :=
σ(1− γ)
4(1 + γ)C2(M + n)2.
If µ ≤ µ1crit, then δ1(µ) is given by the first expression in (2.23), and since
δ ≤ C(M + n)µ ≤√
C2(M + n)2µ1critµ =
√σ(1− γ)µ
4(1 + γ)= δ1(µ),
then (2.25) follows.
If µ > µ1crit, then δ1(µ) is given by the second expression in (2.23), that is,
δ1(µ) = σ(1−γ)4(1+γ)C(M+n)
. Hence if ∆ ≤ ∆min, of course, ∆ ≤ σ(1−γ)4(1+γ)C(M+n)
, i.e.,
∆ ≤ δ1(µ), then (2.25) holds.
2. We will prove now that (2.17) holds for all ∆ ∈ (0, ∆min] with ∆min from
(2.19).
From Lemma 2.3 and αt(∆) ≤ αn(∆) we derive
θL(w(∆)) ≤ (1− αt(∆))θL(w) + ML∆2
θh(w(∆)) ≤ (1− αt(∆))θh(w) + Mh∆2.
Using θh(w) + θL(w) ≤ Mµ, we get
θh(w(∆)) + θL(w(∆)) ≤ (1− αt(∆))Mµ + (Mh + ML)∆2.
On the other hand, by Lemma 2.2,
Mµ(∆) ≥ (1− αt(∆))Mµ + σαt(∆)Mµ− 4M∆2.
Therefore, (2.17) holds whenever
(Mh + ML + 4M)∆2 ≤ σαt(∆)Mµ,
which, by (2.6), is implied by
∆ ≤ min
{√σMµ
Mh + ML + 4M,
σMµ
(Mh + ML + 4M) max{‖sn‖∗, ‖st‖∗}
},
which in turn is true, by (2.24), if
∆ ≤ δ2(µ) := min
{√σMµ
Mh + ML + 4M,
σM
(Mh + ML + 4M)C(M + n)
}.
12
Let
µ2crit :=
σM
(Mh + ML + 4M)C2(M + n)2.
If µ ≤ µ2crit, then by (2.24), δ2(µ) =
√σMµ
Mh+ML+4M, and then
δ ≤ C(M + n)µ ≤√
C2(M + n)2µ2critµ =
√σMµ
Mh + ML + 4M= δ2(µ).
If µ > µ2crit, then δ2(µ) = σM
(Mh+ML+4M)C(M+n). Hence, if ∆ ≤ ∆min, of course,
∆ ≤ σM(Mh+ML+4M)C(M+n)
, then min{∆, δ} ≤ σM(Mh+ML+4M)C(M+n)
= δ2(µ). The
similar process will implies that, for all ∆ ∈ (0, ∆min], (2.17) holds.
3.Finally we prove that (2.18) holds for ∆ ∈ (0, ∆min] such that ∆min from
(2.19). We know that (2.20) and (2.21) hold for all ∆ ∈ (0, ∆min] with ∆min
satisfying (2.19). It follows from (2.22) that
4∆2 < (αn(∆)− αt(∆)(1− σ))µ.
Thus, from (2.20), we get
X(∆) ◦ Z(∆) � γ(1− αn(∆))µI � 0.
Hence, (2.18) follows.
3 The interior-point filter method
We choose θ1, θ2 and θ3, instead of θh, θc and θg, to form a filter entry, where
θh measures feasibility, θc measures centrality and θg measures optimality. Since
θ3 = θg, we replace the objective function value by the criticality measure θg.
Definition 3.1. (Dominance). A point w, or the corresponding triple (θ1(w), θ2(w), θ3(w)),
is said to dominate a point w′, or the corresponding triple (θ1(w′), θ2(w
′), θ3(w′)),
if
θ1(w) ≤ θ1(w′), θ2(w) ≤ θ2(w
′) and θ3(w) ≤ θ3(w′),
or, alternatively, if the following inequality is violated:
max{θ1(w)− θ1(w′), θ2(w)− θ2(w
′), θ3(w)− θ3(w′)} > 0.
13
Definition 3.2. (Filter). A filter is a finite subset F ⊂ R3 consisting of triples
(θf1 , θf
2 , θf3 ) such that no triple can dominate any of the others.
The mere requirement that a new iterate is not dominated by any of the filter
entries is too weak as an acceptance criterion, instead, we require:
Definition 3.3. Let γF ∈ (0, 1/2) be fixed. The point w is acceptable to the
filter F if, for all (θf1 , θf
2 , θf3 ) ∈ F , it holds
max{θf1 − θ1(w), θf
2 − θ2(w), θf3 − θ3(w)} > γFτθf ,
where θf := θfh + θf
c = θf1 + θf
2 and τ > 0 is sufficiently small but bounded away
from zero.
In the course of algorithm, we will add selected new points to the filter. This
procedure is done in the following way.
Definition 3.4. By adding w to the filter F we mean the following operation:
F → F = {(θ1(w), θ2(w), θ3(w))} ∪ {(θf1 , θf
2 , θf3 ) ∈ F :
min{θf1 − θ1(w), θf
2 − θ2(w), θf3 − θ3(w)} < 0}.
Now we state the algorithm below.
Algorithm 3.5. Primal-dual interior-point filter method
step 0. Choose σ ∈ (0, 1), ν ∈ (0, 1), γ1, γ2, τ > 0, 0 < β, η, κ < 1, γF ∈ (0, 1/2).
Set F := ∅. Choose (X0, Z0) � 0 and y0, and determined γ ∈ (0, 1) such that
X0 ◦ Z0 ≥ γµ0I with µ0 = Tr(X0Z0)/n. Furthermore, choose M > 0 such that
θh(w0) + θL(w0) ≤ Mµ0. Choose ∆in0 > 0 and set k := 0.
step 1. Set µk := Tr(XkSk)/n and compute snk and st
k by solving the linear
systems (2.3) and (2.4), respectively, with (w, µ) = (wk, µk).
step 2. Compute ∆′k ∈ [0, ∆in
k ] such that
Xk(∆) � 0, Zk(∆) � 0, Xk(∆) ◦ Zk(∆) � γµkI for all ∆ ∈ [0, ∆′k].
step 3. Compute the largest ∆′′k ∈ (0, ∆′
k] such that
θh(wk(∆′′k)) + θL(wk(∆
′′k)) ≤ Mµk(∆
′′k).
Set ∆k := ∆′′k.
step 4. If θ(wk) ≤ ∆k min{γ1, γ2∆βk}, then go to step 5. Otherwise, add wk to
14
the filter and call a restoration algorithm that produces a point wk+1 such that:
wk+1 ∈ N (γ, M) with µk+1 = Tr(Xk+1Zk+1)/n;
wk+1 is acceptable to the filter;
θ(wk+1) ≤ ∆k+1 min{γ1, γ2∆βk} with ∆k+1 = ∆k.
Set ∆ink+1 := ∆k, k := k + 1, and go to step 1.
step 5. If wk(∆k) is not acceptable to the filter (and mk(wk) − mk(wk(∆k)) <
κθ(wk)2), then go to step 11.
step 6. If mk(wk)−mk(wk(∆k)) = 0, then set ρk := 0. Otherwise,
ρk :=θg(wk)− θg(wk(∆k))
mk(wk)−mk(wk(∆k)).
step 7. If ρk < η and mk(wk)−mk(wk(∆k)) ≥ κθ(wk)2, then go to step 11.
step 8. If mk(wk)−mk(wk(∆k)) < κθ(wk)2, then add wk to the filter.
step 9. Choose ∆ink ≥ ∆k.
step 10. Set wk+1 := wk(∆k), k := k + 1, and go to step 1.
step 11. Set wk+1 := wk, snk+1 := sn
k , stk+1 := st
k, ∆′k+1 := ∆k/2. Set k := k + 1
and go to step 3.
Remark 3.6. In step 2 and step 3, we use a Armoji-like method to get ∆′k and
∆′′k. Note that step 5 guarantees that the potentially new iterate wk(∆k) is always
acceptable to the filter. For the restoration algorithm, see [7].
4 Global convergence
Now we assume that {wk} is a sequence of iterates generated by Algorithm 3.5.
We will also impose the following assumptions.
Assumption 4.1. (A1) The sequence {(xk, yk, Zk)} is bounded.
(A2) The derivatives ∇h and ∇2xwL exist and are Lipschitz continuous in an open
set containing all iterates (xk, yk, Zk) and the line segments [wk, wk + sk(∆k)].
(A3) There exists C > 0 such that for all k it holds ‖F ′k(wk)
−1‖F ≤ C.
The following lemma is a direct consequence of the assumptions, Lemma 2.3
and Lemma 2.4.
15
Lemma 4.2. [7, Lemma 3] The following hold:
i) The sequence {θh(wk)}, {θc(wk)}, {µk}, and {θg(wk)} are bounded.
ii) The constants Mh, Mc, ML, Mg in Lemma 2.3 are bounded for all k.
iii) There exists ∆min > 0 such that the conditions in step 2 and step 3 are
satisfied for all ∆′k, ∆
′′k ∈ [0, ∆min]. Thus, step 2 and step 3 leave ∆in
k unchanged
for 0 ≤ ∆ink ≤ ∆min and we have ∆k = ∆in
k .
iv) It holds max{‖snk‖∗, ‖st
k‖∗} ≤ C(M + (n2 − n + 1)1/2)µk for all k.
Lemma 4.3. (Lemma 4,[7]) If wk is added to the filter, then θ(wk) > 0.
Lemma 4.4. (Lemma 5,[7]) In all iterations k ≥ 0, the current iterate wk is
acceptable to the filter.
Lemma 4.5. (Lemma 6,[7]) There exists a ∆r > 0 such that, if ∆k ≤ ∆r in
step 5, it holds
θ(wk(∆k)) ≤ (Mh + Mc)∆2k.
Lemma 4.6. Suppose that θ(wk)+ θg(wk) ≥ ε > 0 for all k. There exists ∆a > 0
depending only on ε and on the values of the filter entries, such that, if
0 < ∆k ≤ ∆a,
then w(∆k) is in step 5 acceptable to the filter (with wk considered in the filter
when mk(wk)−mk(wk(∆k)) < κθ(wk)2).
Proof. One has from Lemma 4.3 that
θF := minF
(1− γF)θf > 0.
Consider first θ(wk) ≥ ε/2. Then wk(∆k) is acceptable to the filter (with wk
considered in the filter when mk(wk)−mk(wk(∆k)) < κθ(wk)2) if either
θ1(wk)− θ1(wk(∆k)) ≥ τγFθ(wk), θf1 − θ1(wk(∆k)) ≥ τγFθf ,
or
θ2(wk)− θ2(wk(∆k)) ≥ τγFθ(wk), θf2 − θ2(wk(∆k)) ≥ τγFθf .
The above conditions hold only if either
θ1(wk(∆k)) ≤1
2min{θ1 − τγFθ, θf
1 − τγFθf} (4.1)
16
or
θ2(wk(∆k)) ≤1
2min{(1− γF)τε/2, τθF} < min{(1− γF)τε/2, τθF}. (4.2)
Here (4.2) is due to θ1 ≤ 12θ ≤ θ2 ≤ θ, θf
1 ≤ 12θf ≤ θf
2 ≤ θf . The right-hand side
of (4.1) is nonnegative whenever θ1 ≥ τθ and θf1 ≥ τθf . At this time, if
θ1(wk(∆k)) ≤1
2min{(1− γF)τε/2, τθF} < min{(1− γF)τε/2, τθF}, (4.3)
then wk(∆k) is acceptable. And also from Lemma 4.5,
θ(wk(∆k)) ≤ (Mh + Mc)∆2k
holds for ∆k ≤ ∆r. Thus (4.2) or (4.1), (4.3) is satisfied for ∆k ≤ ∆(1)a with
∆(1)a > 0 depending only on θF , ε, Mh, Mc, γF , and ∆r.
Now we consider θg ≥ ε/2. If wk is not consider in the filter in step 5, then
θ1(wk(∆k)) ≤1
2(θf
1 − τγFθf ),
or
θ2(wk(∆k)) ≤1
2(θf
2 − τγFθf ),
will show that if ∆k ≤ ∆(1)a then wk(∆k) is acceptable to the filter. On the other
side, with wk considered in the filter when mk(wk) −mk(wk(∆k)) < κθ(wk)2, if,
in addition,
θg(wk(∆k))− θg(wk) < −γFτθ(wk) (4.4)
Since step 5 is reached, we know that
θ(wk) ≤ γ2∆1+βk .
And we can obtain from θg(wk) ≥ ε/2 and Lemma 2.3 with 0 < ∆k ≤ ∆ub that
θg(wk(∆k))− θg(wk) ≤ −(1− σ)αtkε/2 + Mg∆
2k.
What we need to show is
−(1− σ)αtkε/2 + Mg∆
2k < −γτγ2∆
1+βk .
Since ‖snk‖∗ and ‖st
k‖∗ are bounded by a constant Ms and αtk = min{1, ∆k
‖snk‖∗
, ∆k
‖stk‖∗},
we have for all ∆k ≤ Ms, that αtk ≥ ∆k/Ms. Thus (4.4) holds if
Mg∆k + γFτγ2∆βk ≤
(1− σ)ε
4Ms
<(1− σ)ε
2Ms
,
which in turn holds for all ∆k ≤ ∆(2)a with ∆
(2)a > 0 depending only on ε, Mg, γF ,
τ , γ2, β, σ, Ms, and ∆ub. Taking ∆a = min{∆(1)a , ∆
(2)a } concludes the proof.
17
Lemma 4.7. Suppose that for given ε > 0
θg(wk) ≥ ε and θ(wk) >∆k
2min{γ1, γ2(∆k/2)β}. (4.5)
Then there exists ∆f > 0 such that, if
0 < ∆k ≤ ∆f ,
then wk(∆k) is in step 5 acceptable to filter (with wk considered in the filter when
mk(wk)−mk(wk(∆k)) < κθ(wk)2).
Proof. The proof is very similar to the proof of Lemma 8 in [7].
Lemma 4.8. (Lemma 9,[7]) Beginning from the moment where a point wk is
added to the filter, the filter always contains an entry that dominates wk.
We first investigate what happens if infinitely many values are added to the
filter in the course of the algorithm.
Lemma 4.9. Suppose that infinitely many values of (θ1, θ2, θ3) are added to the
filter. Then there exists a subsequence {ki} such that wkiis added to the filter
and
limi→∞
θh(wki) = lim
i→∞θc(wki
) = 0.
Proof. Now assume, for the purpose of a contradiction, that there exists a sub-
sequence {kj} ⊆ {ki} such that
θh(wkj) + θc(wkj
) ≥ ε1
for some ε1 > 0. Since the assumptions imply that θh(wkj), θc(wkj
), θg(wkj) are
all bounded above and below, there must exist a further subsequence {kl} ⊆ {kj}such that
liml→∞
θh(wkl) + θc(wkl
) = θh(w∞) + θc(w∞), (4.6)
with
θh(w∞) + θc(w∞) ≥ ε.
18
Moreover, by the definition of {θkl}, θkl
is acceptable for the filter for every
l, which implies that for each l, at least one holds among the following three
inequalities,
θ1(wkl)− θ1(wkl−1
) < −γFτθ(wkl),
θ2(wkl)− θ2(wkl−1
) < −γFτθ(wkl),
θ3(wkl)− θ3(wkl−1
) < −γFτθ(wkl).
(4.7)
Hence we deduce from (4.6) that at least one holds among the following three
inequalities,
θ1(wkl)− θ1(wkl−1
) < −γFτε,
θ2(wkl)− θ2(wkl−1
) < −γFτε,
θ3(wkl)− θ3(wkl−1
) < −γFτε,
for l sufficiently large. For the successful one among the three, the left-hand side
tends to zero when l tends to infinity because of (4.6), yielding the contradiction.
Hence
limi→∞
θh(wki) = lim
i→∞θc(wki
) = 0.
Since iterates are added to the filter only if restoration is invoked or in step 8,
the sequence {ki} in the above lemma must contain either a subsequence where
restoration is invoked, or a subsequence where the iterates are added to the filter
in step 8. We start the first case.
Lemma 4.10. (Lemma 11,[7]) Suppose that there exists an infinite sequence
{ki} of iterations at which restoration is invoked and for which holds
limi→∞
θ(wki) = 0.
Then {ki} contains a subsequence {k′j} with
limj→∞
θ(wk′j) = 0, lim
j→∞θg(wk′j
) = 0.
The other situation is where the sequence {ki} of Lemma 4.9 contains a sub-
sequence, where the iterates are added to the filter in step 8.
19
Lemma 4.11. (Lemma 12,[7]) Suppose that there exists an infinite sequence
{ki} of iterations for which wkiis added to the filter in step 8 and, in addition,
limi→∞ θ(wki) = 0. Then {ki} contains a subsequence {k′j} such that
limj→∞
θ(wk′j) = 0, lim
j→∞θg(wk′j
) = 0.
We summarize both situations in the next theorem.
Theorem 4.12. (Theorem 1,[7]) Suppose that infinitely many iterates are
added to the filter. Then there exists a subsequence {kj} such that
limj→∞
θ(wkj) = 0, lim
j→∞θg(wkj
) = 0.
It remains to consider the case where the algorithm runs infinitely but only
finitely many iterates are added to the filter.
Theorem 4.13. (Theorem 2,[7]) Suppose that the algorithm runs infinitely and
only finitely many iterates are added to the filter. Then
limk→∞
θ(wk) = 0, lim infk→∞
θg(wk) = 0.
The main result is obtained by combining Theorem 4.12 and Theorem 4.13.
Corollary 4.14. Under Assumption 4.1, the sequence of iterates {wk} generated
by Algorithm 3.5 satisfies
lim infk→∞
θ1(wk) + θ2(wk) + θ3(wk) = 0.
5 Concluding remarks
We propose a primal-dual interior-point filter method for nonliner semidefinite
programming, where a freshly new definition of filter entries is given. After
the definition, the three-dimensional filter method is proposed for constrained
optimization for the first time. Though the definition of acceptance to filter
is slightly strong in this paper, the new definition of filter triples maybe open
the access to constrained optimization. We expect the future work reducing the
requirement of acceptance to filter.
20
References
[1] R. Fletcher, S. Leyffer, and P.L. Toint. On the Global Convergence of and
SLP-Filter Algorithm, Report 98/13, Dept of Mathematics, FUNDP, Namur
(B), 1998.
[2] R. Fletcher and S. Leyffer, Nonlinear programming without a penalty func-
tion. Mathematical Programming, 91(2):239–269, 2002.
[3] On the Global Convergence of a Filter–SQP Algorithm. SIAM Journal on
Optimization, 13(1):44–59, 2002.
[4] R. Fletcher, N. I. M. Gould, S. Leyffer, P. L. Toint, A. Wachter. Global
convergence of a trust-region SQP-filter algorithm for general nonlinear pro-
gramming. SIAM Journal on Optimization, 13(3):635–659, 2002.
[5] N. I. M. Gould, S. Leyffer, and Ph. L. Toint, A multidimensional filter al-
gorithm for nonlinear equations and nonlinear least-squares. SIAM Journal
on Optimization, 15(1):17-38, 2005.
[6] N. I. M. Gould, C. Sainvitu, and Ph. L. Toint, A filter-trust-region method
for unconstrained optimization. SIAM Journal on Optimization, 16(2):341-
357, 2006.
[7] M. Ulbrich, S. Ulbrich, and L. N. Vicente, A globally convergent primal-
dual interior-point filter method for nonlinear programming.Mathematical
Programming,100(2):379–410, 2004.
21
Top Related