Self-Concordant Barriers for Convex Approximations of ... · (where eis an all ones vector of...
Transcript of Self-Concordant Barriers for Convex Approximations of ... · (where eis an all ones vector of...
Self-Concordant Barriers for Convex Approximations of
Structured Convex Sets
Levent Tuncel∗ Arkadi Nemirovski†
C & O Research Report: CORR 2007–03February 22, 2007
Abstract
We show how to approximate the feasible region of structured convex optimization prob-lems by a family of convex sets with explicitly given and efficient (if the accuracy of theapproximation is moderate) self-concordant barriers. This approach extends the reach ofthe modern theory of interior-point methods, and lays the foundation for new ways to treatstructured convex optimization problems with a very large number of constraints. Moreover,our approach provides a strong connection from the theory of self-concordant barriers to thecombinatorial optimization literature on solving packing and covering problems.
Keywords: convex optimization, self-concordant barriers, semidefinite programming, interior-point methods, packing-covering problems
AMS Subject Classification: 90C25, 90C51, 90C22, 90C05, 52A41, 49M37, 90C59, 90C06
∗([email protected]) Department of Combinatorics and Optimization, Faculty of Mathematics, Uni-versity of Waterloo, Waterloo, Ontario N2L 3G1, Canada. Research of this author was supported in part byDiscovery Grants from NSERC.
†([email protected]) School of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, Georgia, 30332-0205 USA. Part of this research was done while this author was an Adjunct Professor atthe Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo, supportedin part by a Discovery Grant from NSERC.
1
1 Introduction
In modern convex optimization, when we consider the polynomial-time algorithms, two familiesof algorithms stand out:
• interior-point methods,
• ellipsoid method and related first-order methods.
Let G ⊂ Rn be a closed convex set with nonempty interior. Let c ∈ R
n be given. Then to solvethe convex optimization problem
inf 〈c, x〉 : x ∈ G ,the modern interior-point methods usually require a computable self-concordant barrier functionf for G. Such functions (defined in the next section), completely describe the set G and itsboundary in a very special way. The ellipsoid and related first-order methods require an efficientseparation oracle for G. Such an oracle provides some local information about set G when it iscalled.
The ellipsoid method and the first-order methods require very little global information aboutG in each iteration. Another important difference between the two families of methods is thatinterior-point methods usually need f ′, f ′′ (the first and the second derivatives of the self-concordant barrier f) at every iteration and the elementary operations needed can be as badas Θ(n3) per iteration. In contrast, the ellipsoid method and the first-order methods can workwith O(n2) operations per iteration and sometimes, for structured problems, they require muchless than Θ(n2) work per iteration.
If we have a self-concordant barrier for G with barrier parameter ϑ, then O(√
ϑ ln(
1ǫ
))
iterations of interior-point methods will suffice to produce an ǫ-optimal solution (we assumed thatcertain scale factors for the input problem are bounded by O
(1ǫ
)). This bound is significantly
better than the best bounds of similar nature for the first-order algorithms. Moreover, wheneveran application requires a very high accuracy of the solution, the practical performance of theinterior-point algorithms are even much better than the corresponding first-order methods.
Nevertheless, an important advantage of the first-order methods can be observed at anotherextreme (low accuracy and extremely large dimension). The first-order methods can be used thesolve extremely large-scale problems (those for which performing even a single iteration of theinterior-point methods is out of reach of the current hardware/software combinations) as longas the required accuracy of the final solution is modest e.g., 10−2 or 10−4. Indeed, in many realapplications, it does not even make sense to ask for more accuracy than 10−2, due to the natureof the problem and the data collected as well as the final (practical) uses of the approximatesolutions.
Another important theoretical property of the ellipsoid method and the related first-ordermethods is that in some sense (in the black-box oracle model) for dimension-independent it-eration bounds, they are optimal. That is, the upperbound on the number of iterations of afirst-order method which only uses black-box subgradient information, O
(1ǫ2
)cannot be im-
proved ([15]). However, as Nesterov [19] recently showed, the utilization of certain knowledge ofthe structure of the convex optimization problem at hand does help improve this upperboundvery significantly to O
(1ǫ
)(also see [16], [13]).
In combinatorial optimization, one of the most interesting and quite general structures isdescribed by packing and covering problems; see the recent book by Cornuejols [6]. Given an
2
m-by-n matrix A with only 0,1 entries, and an objective function vector c, the combinatorialoptimization problem
max 〈c, x〉Ax ≤ ex ∈ 0, 1n,
(where e is an all ones vector of appropriate size) is a packing problem. The combinatorialoptimization problem
min 〈c, x〉Ax ≥ ex ∈ 0, 1n,
is a covering problem. Both theoretical and practical approaches for solving packing and coveringproblems usually involve their linear programming relaxations. We will mostly deal with suchproblems and their generalizations.
Let ai denote the ith row of A. Consider the function
H(x) := ln
(m∑
i=1
exp〈ai, x〉)
.
So-called the exponential “potential” function H(x) has been used in the context of approxi-mately solving special classes of linear optimization problems (mainly those arising from coveringand packing problems and minimum cost multi-commodity flow problems in combinatorial opti-mization), see [23, 10, 11, 21, 7, 4, 5]. In fact, such approach proved useful and interesting evenin the case of convex optimization problems with special convex functions and block diagonalstructure in the constraints [9]. However, this function is not a self-concordant barrier and tothe best of our knowledge, the role of self-concordant barriers in this context was previouslynon-existent.
Also, recently very good complexity bounds were obtained for the first-order algorithmsfor linear programming problems via approximations to the feasible regions which have certainsymmetry properties [17].
In contrast to the existing work above, we will show how to approximate the feasible regionof the structured optimization problem at hand by a family of convex sets for which we haveefficient (if the accuracy of approximation is not too high) self-concordant barriers.
We will construct self-concordant barriers with parameter ϑ for G (a convex approximationof G) either independent of or only logarithmically dependent on the larger of the dimensions ofthe problem (m), but strongly dependent on the goodness of our approximation to G.
Our work serves the following three purposes:
• We lay the foundation to bring the theory of modern interior-point methods (based on theself-concordance theory) closer to the ellipsoid method and the recently proposed first-ordermethods in terms of the worst-case iteration bounds that are independent of the larger ofthe problem dimensions, but dependent on the desired accuracy ǫ as a polynomial in 1
ǫ .
• From a technical point of view, we show a new way of dealing with exponentially manyterms in designing self-concordant barriers.
• We make a strong connection to the combinatorial optimization literature from the interior-point theory by providing new theoretical results for packing-covering LPs (and their vast,nonpolyhedral generalizations) based on self-concordant barriers.
3
A very important warning to the reader is about the computability of the barriers. Let m bethe larger of the dimensions in the problem data (and n is the smaller one). While the barrierparameters will grow at most with ln(m), if m is very very large, say m ≈ 2n, then evaluatingour barriers directly from the formulae we give can require Ω(m) (≈ 2n) work.
The paper is organized as follows: The next section deals with the LP relaxations of packingand covering problems. Two families of self-concordant barriers are derived, one is based on theexponential function and the other is based on the p-norm. Section 3 is very brief and simplypoints out that it is elementary to replace the nonnegative orthant by a closed convex cone.Section 4 generalizes the results much more significantly by replacing the linear functionals fromSection 2 with linear operators and by replacing the linear inequalities of Section 2 with thepartial orders induced by the cone of positive semidefinite matrices. In Section 5, we illustratethat some basic patterns of the first three derivatives of the functions we used can be extendedto a semi-infinite dimensional setting. The development up to Section 6 is not primal-dualsymmetric. So, in Section 6 we study duality in this context and based on a technique usedby the second author earlier, show how to generate a good dual feasible solution from a good,central primal solution. Finally, in Section 7 we conclude the technical results of the paper witha proposition showing that the square of the matrix p-norm function has Lipschitz continuousgradient with Lipschitz constant 2(p − 1). Such fact is useful in improving complexity boundsfor convex optimization problems involving such matrix functions.
2 Packing-Covering LPs
In this section, we start our study with the packing-covering LPs. First, we define the well-knownnotion of self-concordant barriers.
Definition 2.1 Let G ⊂ Rn be a closed convex set with nonempty interior. Then f : int (G) →
R is called a self-concordant barrier (s.c.b.) for G with barrier parameter ϑ if the followingconditions are satisfied:
• f ∈ C3, strictly convex on int (G);
• for every sequencex(k)
⊂ int (G) such that x(k) → x ∈ ∂G, f
(x(k)
)→ +∞;
•∣∣D3f(x)[h, h, h]
∣∣ ≤ 2
[D2f(x)[h, h]
]3/2, ∀x ∈ int (G), ∀h ∈ R
n,
• (Df(x)[h])2 ≤ ϑD2f(x)[h, h], for every x ∈ int (G), h ∈ Rn.
Suppose G is a closed convex cone with nonempty interior. Then a s.c.b. f for G withbarrier parameter ϑ is called logarithmically homogeneous if
f(tx) = f(x) − ϑ ln t, ∀x ∈ int (G),∀t > 0.
Consider the LP relaxation of a covering problem. Then the constraints Ax ≤ e are satisfiediff maxi 〈ai, x〉 ≤ 1. Roughly stated, we consider two approximations to the latter condition:
1. Log-exp construction:
m∑
i=1
expL〈ai, x〉 ≤ expL, for large L;
4
2. ‖ · ‖p-construction:(
m∑
i=1
〈ai, x〉p)1/p
≤ 1, for large p.
2.1 The log-exp construction
We begin with the results of approximating the constraint maxi 〈ai, x〉 ≤ 1 by∑
i expL〈ai, x〉 ≤expL, for large L.
Proposition 2.1 Let ai ∈ Rn+\0, i = 1, ...,m, and let
G(L) := x ∈ Rn+ :∑
iexpL〈ai, x〉 ≤ expL, [L > max[lnm, 3
2 ]];
G(s) := x ∈ Rn+ : 〈ai, x〉 ≤ s, i = 1, ...,m, [s > 0].
Then
G
(
1 − lnm
L
)
⊆ G(L) ⊆ G(1) (1)
and the function
FL(x) := − ln
(
L− ln
(∑
i
expL〈ai, x〉))
−(
2L
3
)2
F−(x), (2)
where
F−(x) :=
n∑
j=1
lnxj
is a ϑ(L)-self-concordant barrier for G(L), with
ϑ(L) := 1 +
(2L
3
)2
n. (3)
Proof. 10. We clearly have
∑
i
expL〈ai, x〉 ≤ expL ⇒ expL〈ai, x〉 ≤ expL∀i⇒ 〈ai, x〉 ≤ 1∀i,
whence G(L) ⊆ G(1). On the other hand, if x ∈ G(s),∑
iexpL〈ai, x〉 ≤ m expLs, so that
s ≤ 1 − ln mL ⇒ x ∈ G(L). (1) is proved.
20. Let
H(x) := ln
(∑
i
exp〈bi, x〉)
, bi := Lai.
Then
DH(x)[h] =
P
iexp〈bi,x〉〈bi,h〉P
iexp〈bi,x〉
=∑
ipi〈bi, h〉
[
pi := exp〈bi,x〉P
jexp〈bj ,x〉 ,
∑
ipi = 1
] (4)
5
whence
D2H(x)[h, h]
= −
„
P
iexp〈bi,x〉〈bi,h〉
«2
„
P
iexp〈bi,x〉
«2 +
P
iexp〈bi,x〉〈bi,h〉
2
P
iexp〈bi,x〉
= −(∑
ipi〈bi, h〉
)2
+∑
ipi〈bi, h〉2 =
∑
ipis
2i
[
si := 〈bi, h〉 − µ, µ :=∑
jpj〈bj , h〉,
∑
ipisi = 0
]
(5)and finally
D3H(x)[h, h, h]
= 2
„
P
iexp〈bi,x〉〈bi,h〉
«3
„
P
iexp〈bi,x〉
«3 − 3
„
P
iexp〈bi,x〉〈bi,h〉
«„
P
iexp〈bi,x〉〈bi,h〉
2
«
„
P
iexp〈bi,x〉
«2 +
P
iexp〈bi,x〉〈bi,h〉
3
P
iexp〈bi,x〉
= 2
(∑
ipi(si + µ)
)3
− 3
(∑
ipi(si + µ)
)(∑
ipi(si + µ)2
)
+∑
ipi(si + µ)3
= 2µ3 − 3µ(µ2 +∑
ipis
2i ) +
∑
ipi(s
3i + 3s2iµ+ 3siµ
2 + µ3)
= 2µ3 − 3µ3 − 3µ∑
ipis
2i +
∑
ipis
3i + 3µ
∑
ipis
2i + µ3 =
∑
ipis
3i .
(6)
We arrive at the following observation:
Lemma 2.1 . Let x ∈ G(L)⋂
int Rn+ and let h be such that x± h ∈ R
n+. Then
|D3H(x)[h, h, h]| ≤ 2LD2H(x)[h, h]. (7)
Proof. Indeed, if x ± h ∈ Rn+, then |hj | ≤ xj , j = 1, ..., n, whence |〈bi, h〉| ≤ 〈bi, x〉 ≤ L (recall
that ai ≥ 0 and x ∈ G(L) ⊆ G(1)). It follows that in the notation of (4) – (6) we have |µ| ≤ L,whence |si| ≤ 2L. With this in mind, (7) follows from the concluding relations in (5) and (6).
Now let us use the following result from [20]:
Lemma 2.2 Let
• G+ ⊂ RN be a closed convex domain, F+ be a ϑ+-self-concordant barrier for G+ and
K be the recessive cone of G+;
• G− be a closed convex domain in Rn and F− be a ϑ−-self-concordant barrier for G−;
• A : intG− → RN be a C3 mapping such that D2A(x)[h, h] ∈ −K for all x ∈ intG−
and
∀(x ∈ intG−,A(x) ∈ intG+)∀(h, x± h ∈ G−) : D3A(x)[h, h, h] ≤K −3βD2A(x)[h, h];(8)
• the set Go := x ∈ intG− : A(x) ∈ intG+ be nonempty.
Then Go is an open convex domain, and the function
F+(A(x)) + max[1, β2]F−(x)
is a self-concordant barrier for clGo with the parameter
ϑ := ϑ+ + max[1, β2]ϑ−.
Note: In [20], relation (8) is assumed to be valid for all x ∈ intG−. However, theproof presented in [20] in fact requires only a weaker form of the assumption givenby (8).
6
Now let us specialize the data in the statement of Lemma 2.2 as follows:
• G+ := t ≥ 0 ⊂ R, F+(t) := − ln(t) (ϑ+ = 1, K := R+);
• G− := Rn+, F−(x) := −
n∑
j=1lnxj (ϑ− = n);
• A(x) := L− ln
(∑
iexpL〈ai, x〉
)
.
When L > lnm, this data clearly satisfies all of the requirements from the premise of Lemma2.2, except for (8); by Lemma 2.1, the latter requirement also is satisfied with β = 2L
3 . ApplyingLemma 2.2, we arrive at the desired result.
2.2 The ‖ · ‖p-construction
Now, we consider approximating the constraint maxi 〈ai, x〉 ≤ 1 via a p-norm function.
Proposition 2.2 Let ai ∈ Rn+\0, i = 1, ...,m. For p 6= 0 let
H(x) :=
(m∑
i=1
〈ai, x〉p)1/p
,
and let
K(p) :=
(x, t) ∈ R
n+ × R : t ≥ H(x), p ≥ 1
(x, t) ∈ Rn+ × R : t ≤ H(x), p ≤ 1, p 6= 0.
Then
p ≥ 1 ⇒
(x, t) ∈ Rn+ × R : max
i〈ai, x〉 ≤ t
m1/p
⊆ K(p) ⊆
(x, t) ∈ Rn+ × R : max
i〈ai, x〉 ≤ t
,
p < 0 ⇒
(x, t) ∈ Rn+ × R : min
i〈ai, x〉 ≥ t
m1/p
⊆ K(p) ⊆
(x, t) ∈ Rn+ × R : min
i〈ai, x〉 ≥ t
(9)and the function
Fp(x) := −(
2|p − 2| + 3
3
)2
F−(x)−
ln (t−H(x)) , p ≥ 1ln (H(x) − t) , p ≤ 1, p 6= 0
, F−(x) :=
n∑
j=1
lnxj, (10)
is a ϑp-logarithmically homogeneous self-concordant barrier for K(p), with
ϑp := 1 +
(2|p − 2| + 3
3
)2
n. (11)
Proof We proceed as in the proof of Proposition 2.1. As in the latter proof, the only facts tobe verified are that
(a) H is convex on int Rn+ when p ≥ 1 and is concave on int R
n+ when p ≤ 1,p 6= 0;
(b) whenever x ∈ int Rn+ and h is such that x± h ∈ R
n+, we have
|D3H(x)[h, h, h]| ≤ (2|p − 2| + 3) |D2H(x)[h, h]|. (12)
7
Assume that p 6= 0. Given x ∈ int Rn+, ai ∈ R
n+\0, h ∈ R
n such that x± h ∈ Rn+, let us set
pi(x) :=〈ai, x〉p∑
j〈aj , x〉p
, δi(x) :=〈ai, h〉〈ai, x〉
, µ(x) :=∑
i
pi(x)δi(x), si(x) := δi(x) − µ(x).
We have
DH(x)[h] = H(x)µ(x);D2H(x)[h, h] = H(x)
[µ2(x) +Dµ(x)[h]
]
= H(x)
[
µ2(x) +
[∑
i(p − 1)pi(x)δ
2i (x) − pµ2(x)
]]
= H(x)
[
µ2(x) + (p − 1)∑
ipi(x)s
2i (x) + (p− 1)µ2(x) − pµ2(x)
]
= (p− 1)H(x)∑
ipi(x)s
2i (x);
D3H(x)[h, h, h] = H(x)[µ3(x) + µ(x)Dµ(x)[h] + 2µ(x)Dµ(x)[h] +D2µ(x)[h, h]
]
= H(x)[µ3(x) + 3µ(x)Dµ(x)[h] +D2µ(x)[h, h]
]
= H(x)
[
µ3(x) + 3µ(x)Dµ(x)[h] − 2pµ(x)Dµ(x)[h]
−2(p− 1)∑
ipi(x)δ
3i (x) + p(p− 1)
∑
ipi(x)δ
3i (x) − p(p− 1)
(∑
ipi(x)δ
2i
)
µ(x)
]
= H(x)
[
µ3(x) + (3 − 2p)µ(x)[(p − 1)∑
ipi(x)δ
2i (x) − pµ2(x)]
+(p− 1)(p − 2)∑
ipi(x)δ
3i (x) − p(p− 1)µ(x)
∑
ipi(x)δ
2i (x)
]
= H(x)
[
[1 − p(3 − 2p)]µ3(x) + (p− 1)(3 − 3p)µ(x)
[∑
ipi(x)s
2i (x) + µ2(x)
]
+(p− 1)(p − 2)∑
ipi(x)[s
3i (x) + 3s2i (x)µ(x) + 3si(x)µ
2(x) + µ3(x)]
]
= H(x)
[
(p − 1)(p − 2)∑
ipi(x)s
3i (x) − 3(p − 1)µ(x)
∑
ipi(x)s
2i (x)
]
.
Since ai ∈ Rn+, x ∈ int R
n+ and x±h ∈ R
n+, we have |δi(x)| ≤ 1, whence |µ(x)| ≤ 1 and |si(x)| ≤ 2.
We see that H is convex when p ≥ 1, H is concave when p ≤ 1, p 6= 0 and (12) takes place.
Corollary 2.1 For i = 1, ...,m, let ai ∈ Rn+\0 and πi > 0 be such that
∑
iπi = 1. Then
the concave function f(x, t) = 〈a1, x〉π1 ...〈am, x〉πm − t considered as a mapping from intG− :=int R
n+ × R to R, satisfies (8) with G+ := R+ and β = 7
3 :
x > 0, x± h ≥ 0 ⇒ |D3f(x)[h, h, h]| ≤ −7D2f(x)[h, h].
Proof. Indeed, we have seen that if 0 < p < 1 then Hp(x) =
(∑
i〈ai, x〉p
)1/p
satisfies the
relationx > 0, x± h ≥ 0 ⇒ |D3Hp(x)[h, h, h]| ≤ −(2|p− 2| + 3)D2Hp(x)[h, h].
It remains to note that as p→ 0+, the functionsHp(x)
m1/p converge uniformly along with derivatives
on every compact subset of int Rn+ to the function g(x) =
(m∏
i=1〈ai, x〉
)1/m
. It follows that the
8
desired inequality is valid when π1 = · · · = πm = 1m . This fact in turn implies that the desired
relation is valid when all πi are rational, which in turn implies the validity of the statement forall πi > 0 with the unit sum.
As a direct consequence of the last corollary, we also have the following fact.
Corollary 2.2 The function
F (x) := − ln
(m∏
i=1
〈ai, x〉πi − t
)
−(
7
3
)2 n∑
j=1
lnxj
is an O(1)n-self-concordant barrier for the convex set
(t, x) ∈ R+ × Rn+ :
m∏
i=1
〈ai, x〉πi ≥ t
.
In many problems from combinatorial optimization, we are usually interested in computingthe maximum (or minimum) cardinality sets satisfying a certain criteria. E.g., maximum car-dinality stable set, maximum cardinality matching, minimum cardinality node cover, minimumnumber of colors needed to color a graph. In these cases, a 1
n -approximation to the underlyingpolytope usually yields an exact algorithm to compute that maximum or minimum value. First,we work with the desired tolerance ǫ, then we will substitute ǫ = 1
n . Thus, in our notationabove, we need
1 − ln(m)
L≥ 1 − ǫ
which is equivalent to
L ≥ ln(m)
ǫ.
Therefore, the self-concordance parameter of FL is
1 +4n ln2(m)
9ǫ2
and in the case of ǫ = 1/n, we arrive at ϑ(L) = O(n5). This would imply an iteration bound of
O(n2.5 lnn
).
One distinct advantage of these new self-concordant barriers is that their barrier parametercan be kept fixed as we add cutting-planes. Let us fix the desired tolerance ǫ. In many cuttingplane schemes that admit polynomial or pseudo-polynomial complexity analyses, one can boundfrom above the number of cutting planes that will be generated by the cutting plane scheme. Letus suppose this upperbound is m and that m bounded by a polynomial function of the input.When we construct our barrier FL (or Fp), we can compute L (or p) using the upper bound m.Then in a cutting-plane scheme, as we add new constraints, the barrier parameter stays fixed atϑL (or ϑp). This can be a significant advantage at least in theoretical work on such algorithms.
In what follows, we call Rn+ the primary domain, and name the set
x ∈ Rn : 〈ai, x〉 ≤ 1, i = 1, 2, . . . ,m
the secondary domain.Next, we show that the above approach can be widely generalized in terms of each of these
domains.
9
3 Generalization of the Primary Domain
From the outlined proofs it is clear that the above results remain valid when
• the primary domain (nonnegative orthant Rn+) is replaced by an arbitrary closed convex
pointed cone K with nonempty interior,
• assumption ai ∈ Rn+\0 is replaced by ai ∈ K∗\0, where K∗ is the cone dual to K,
K∗ := s ∈ Rn : 〈x, s〉 ≥ 0, ∀x ∈ K ;
• the barrier F−(x) = −∑j
lnxj for Rn+ is replaced by a ϑ−-self-concordant barrier F for K,
and the factor n in (3), (11) is replaced by ϑ−.
4 Generalization of the Secondary Domain
In addition to the above generalization of the primary domain, we can also generalize each con-straint of the secondary domain. For example, for each i, we can replace 〈ai, x〉 ≤ 1 by Ai(x) I,where Ai : R
n → Sni a linear map (from R
n to the space of ni-by-ni symmetric matrices withreal entries) and “” is the partial order induced by the cone of positive semidefinite matrices,S
ni+ , in S
ni . So, Ai(x) I means [I −Ai(x)] ∈ Sni+ . The results of Sections 2 and 3 are included
as the special case ni = 1 for every i = 1, 2, . . . ,m.We first present the generalization of the p-norm construction.
4.1 The ‖ · ‖p-construction
4.1.1 Compatibility
Let Tr : Sn → R denote the trace and S
n++ denote the interior of S
n+. We sometimes write x ≻ 0
to mean that x is a symmetric positive definite matrix. We start by establishing the followingfact:
Proposition 4.1 Let p, |p| ≥ 2, be integer. Consider the following functions of x ∈ Sn++:
F (x) := Tr(xp), f(x) := (F (x))1/p.
Then f is convex when p ≥ 2, f is concave when p ≤ −2, and
x ≻ 0, x± h 0 ⇒ |D3f(x)[h, h, h]| ≤ O(1)|p||D2f(x)[h, h]| (13)
(from now on, all O(1)’s are positive absolute constants).
10
Proof. 10. Let us compute the derivatives of F and f . We have
Df(x)[h]
= 1p (F (x))1/p DF (x)[h]
F (x) = f(x)DF (x)[h]pF (x)
D2f(x)[h, h]
= f(x)(
DF (x)[h]pF (x)
)2
+ f(x)D2F (x)[h,h]pF (x) − f(x) (DF (x)[h])2
pF 2(x)
= f(x)
[
(1 − p)(
DF (x)[h]pF (x)
)2
+ D2F (x)[h,h]pF (x)
]
D3f(x)[h, h, h]
= f(x)
[
(1 − p)(
DF (x)[h]pF (x)
)3
+ DF (x)[h]D2F (x)[h,h]p2F 2(x)
]
+f(x)[
2(1 − p)DF (x)[h]p2F (x)
[D2F (x)[h,h]
F (x) − (DF (x)[h])2
F 2(x)
]
− DF (x)[h]D2F (x)[h,h]pF 2(x) + D3F (x)[h,h,h]
pF (x)
]
= f(x)
[
(2p− 1)(p− 1)(
DF (x)[h]pF (x)
)3
− 3(p− 1)DF (x)[h]D2F (x)[h,h]p2F 2(x) + D3F (x)[h,h,h]
pF (x)
]
.
Let ej be the orthonormal eigenbasis of x and let xj denote the corresponding eigenvalues of x (i.e.,
xej = xjej). Further let hkj = eTj hek, ηkj = x
−1/2k x
−1/2j hkj . We have, assuming all xj ’s distinct:
DF (x)[h] = pTr(xp−1h) = p∑
j
xp−1jj hjj = 1
2πi
∮pTr((zI − x)−1h)zp−1dz
D2F (x)[h, h] = 12πi
∮pTr((zI − x)−1h(zI − x)−1h)zp−1dz = 1
2πi
∮p∑
j,k
h2jkzp−1
(z−xj)(z−xk)dz
=∑
j 6=k
h2jkp
xp−1j −xp−1
k
xj−xk+∑
j
p(p− 1)h2jjx
p−2j
D3F (x)[h, h, h] = 22πi
∮pTr((zI − x)−1h(zI − x)−1h(zI − x)−1h)zp−1dz
= 22πi
∮p∑
j,k,ℓ
hjkhkℓhℓjzp−1
(z−xj)(z−xk)(z−xl)dz = 2p
∑
j,k,ℓ
hjkhkℓhℓjΓ(xj , xk, xℓ)
where, for distinct a, b, c,
Γ(a, b, c) =ap−1
(a− b)(a− c)+
bp−1
(b− a)(b − c)+
cp−1
(c− a)(c− b),
andΓ(a, b, c) = lim
(a′,b′,c′)→(a,b,c)
a′ 6=b′ 6=c′ 6=a′
Γ(a′, b′, c′).
Let q = |p|, and let k 6= j. When p ≥ 2, we have xjxkxp−1
j −xp−1k
xj−xk=
∑
α+β=qα,β≥1
xαj x
βk . When p ≤ −2, we have
xp−1j −xp−1
k
xj−xk=
(1/xj)q+1−(1/xk)q+1
xjxk((1/xk)−(1/xj))= −x−1
j x−1k
∑
α+β=q
x−αj x−β
k , that is, xjxkxp−1
j −xp−1k
xj−xk= − ∑
α+β=q
x−αj x−β
k .
It follows that
f−1(x)Df(x)[h] =
∑
j
xpjηjj
∑
j
xpj
=∑
j
pjηjj =: µ;
[
pj :=xp
jP
k
xp
k
]
further, in the case of p ≥ 2 we have
f−1(x)D2f(x)[h, h] = (1 − p)µ2 + p−1D2F (x)[h,h]F (x) = (1 − p)µ2 +
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k+
P
j
(p−1)η2jjxp
j
P
ℓ
xp
ℓ
= (p− 1)
[
∑
j
pjη2jj − µ2
]
+
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
ℓ
xp
ℓ
= (p− 1)∑
j
pjδ2jj +
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
ℓ
xp
ℓ
[δjj := ηjj − µ]
11
while in the case of p ≤ −2 we have
f−1(x)D2f(x)[h, h] = (1 − p)µ2 + p−1D2F (x)[h,h]F (x) = (1 − p)µ2 +
−P
j 6=k
η2jk
P
α+β=|p|
x−αj x−β
k+
P
j
(p−1)η2jjxp
j
P
ℓ
xp
ℓ
= (p− 1)
[
∑
j
pjη2jj − µ2
]
−P
j 6=k
η2jk
P
α+β=|p|
x−αj x−β
k
P
ℓ
xp
ℓ
= (p− 1)∑
j
pjδ2jj −
P
j 6=k
η2jk
P
α+β=|p|
x−αj x−β
k
P
ℓ
xp
ℓ
.
Finally, in the case of p ≥ 2 we have
f−1(x)D3f(x)[h, h, h] = (2p− 1)(p− 1)µ3 − 3(p− 1)µ
[
(p− 1)∑
j
pjη2jj +
P
j 6=k
η2jkxjxk
P
α+β=p−2
xαj xβ
k
P
ℓ
xp
ℓ
]
+2
P
j,k,ℓ
xjxkxℓηjkηkℓηℓjΓ(xj ,xk,xℓ)
P
ν
xpν
and for distinct positive a, b, c the following relations hold:
Γ(a, b, c) = ap−1
(a−b)(a−c) + bp−1
(b−c)(b−a) + cp−1
(c−a)(c−b) = ap−1−cp−1
(a−b)(a−c) + bp−1−cp−1
(b−c)(b−a)
=∑
α+β=p−2
[aαcβ
a−b + bαcβ
b−a
]
=∑
0≤β≤p−2
cβ∑
0≤α≤p−2−β
aα−bα
a−b
=∑
α+β+γ=p−3
aαbβcγ .
This concluding identity clearly remains valid when not all a, b, c are distinct. Thus, in the case of p ≥ 2we have
f−1(x)D3f(x)[h, h, h]
= (2p− 1)(p− 1)µ3 − 3(p− 1)µ
[
(p− 1)µ2 + (p− 1)∑
j
pjδ2jj +
P
j 6=k
η2jkxjxk
P
α+β=p−2
xαj xβ
k
P
ℓ
xp
ℓ
]
+2
P
j,k,ℓ
xjxkxℓηjkηkℓηℓj
P
α+β+γ=p−3
xαj xβ
kxγ
ℓ
P
ν
xpν
= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑
j
pjδ2jj + 2
∑
j
pjη3jj
∑
α+β+γ=p−3
1 − 3(p− 1)µ
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
ℓ
xp
ℓ
+
2P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj
P
α+β+γ=pα,β,γ≥1
xαj xβ
kxγ
ℓ
P
ν
xpν
= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑
j
pjδ2jj + (p− 1)(p− 2)
∑
j
pjη3jj − 3(p− 1)µ
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
ℓ
xp
ℓ
+
2P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj
P
α+β+γ=pα,β,γ≥1
xαj xβ
kxγ
ℓ
P
ν
xpν
= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑
j
pjδ2jj + (p− 1)(p− 2)
[
∑
j
pj [µ3 + 3µ2δjj + 3µδ2jj + δ3jj ]
]
−3(p− 1)µ
P
j 6=k
η2jkxjxk
P
α+β=p−2
xαj xβ
k
P
ℓ
xp
ℓ
+2
P
(j 6=k)|(j 6=ℓ)
xjxkxℓηjkηkℓηℓj
P
α+β+γ=p−3
xαj xβ
kxγ
ℓ
P
ν
xpν
= −3(p− 1)µ∑
j
pjδ2jj + (p− 1)(p− 2)
∑
j
pjδ3jj − 3(p− 1)µ
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
ℓ
xp
ℓ
+
2P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj
P
α+β+γ=pα,β,γ≥1
xαj xβ
kxγ
ℓ
P
ν
xpν
.
12
In the case of p ≤ −2 we get Γ(a, b, c) = 1abc
∑
α+β+γ=|p|
a−αb−βc−γ , and the resulting formula for
f−1(x)D3f(x)[h, h, h] becomes
f−1(x)D3f(x)[h, h, h] = −3(p− 1)µ∑
j
pjδ2jj + (p− 1)(p− 2)
∑
j
pjδ3jj + 3(p− 1)µ
P
j 6=k
η2jk
P
α+β=|p|
x−αj x−β
k
P
ℓ
xp
ℓ
+2
P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj
P
α+β+γ=|p|
x−αj x−β
kx−γ
ℓ
P
ν
xpν
.
The resulting formulas, obtained for the case when all xj are distinct, clearly remain valid for all x ≻ 0.
Thus, for all x ≻ 0 and all h we have, setting f = f(x), df = Df(x)[h], d2f = D2f(x)[h, h],d3f = D3f(x)[h, h, h]:
f−1df =∑
jpjηjj =: µ
[
pj =xp
jP
kxp
k
]
p ≥ 2 :
f−1d2f = (p− 1)∑
jpjδ
2jj +
P
j 6=kη2
jk
P
α+β=pα,β≥1
xαj xβ
k
P
νxp
ν[δjj = ηjj − µ]
p ≤ −2 :
f−1d2f = (p− 1)∑
jpjδ
2jj −
P
j 6=k
η2jk
P
α+β=|p|
x−αj x−β
k
P
νxp
ν
p ≥ 2 :
f−1d3f = −3(p − 1)µ∑
jpjδ
2jj + (p− 1)(p − 2)
∑
jpjδ
3jj
−3(p − 1)µ
P
j 6=k
η2jk
P
α+β=pα,β≥1
xαj xβ
k
P
νxp
ν+ 2
P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓjP
α+β+γ=pα,β,γ≥1
xαj xβ
kxγℓ
P
νxp
ν
p ≤ −2 :
f−1d3f = −3(p − 1)µ∑
jpjδ
2jj + (p− 1)(p − 2)
∑
jpjδ
3jj
+3(p − 1)µ
P
j 6=kη2
jk
P
α+β=|p|
x−αj x−β
k
P
νxp
ν+ 2
P
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓjP
α+β+γ=|p|
x−αj x−β
k x−γℓ
P
νxp
ν,
(14)
where xj are the eigenvalues of x, ηij = x−1/2i (eTi hej)x
−1/2j and ei form an orthonormal eigenbasis
of x. Note that under the premise of (13) we have
−I η I. (15)
20. In the sequel, we focus on the case of p ≥ 2. The reasoning in the case of p ≤ −2 issimilar.
We have the following
13
Lemma 4.1 Suppose x ≻ 0 and that (15) holds. Then
(a) |δjj| ≤ 2(b) |µ| ≤ 1
(c) | ∑
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj∑
α+β+γ=pα,β,γ≥1
xαj x
βkx
γℓ | ≤ O(1)p
∑
j 6=k
η2jk
∑
α+β=pα,β≥1
xαj
︸ ︷︷ ︸
R
xβk . (16)
Proof. By (15), we have |ηjj | ≤ 1. Since µ is a convex combination of ηjj and δjj = ηjj − µ, (a), (b)follow.
Let ζ be the matrix obtained from η by replacing the diagonal entries with 0. By (15), we have
−2I ζ 2I. (17)
We now have
∑
(j 6=k)|(j 6=ℓ)
ηjkηkℓηℓj
∑
α+β+γ=p
α,β,γ≥1
xαj x
βkx
γℓ = 3
∑
k 6=j
ηjjη2jk
∑
α+β+γ=p
α,β,γ≥1
xα+βj xγ
k
︸ ︷︷ ︸
I1
+∑
α+β+γ=p
α,β,γ≥1
Ψ(α,β,γ)︷ ︸︸ ︷∑
j,k,ℓ
ζjkζkℓζℓjxαj x
βkx
γℓ
︸ ︷︷ ︸
I2
.
(18)We also have
|I1| ≤∑
k 6=j
|ηjj |η2jk
∑
α+β+γ=p
α,β,γ≥1
xα+βj xγ
k ≤∑
α+β+γ=p
α,β,γ≥1
∑
j 6=k
η2jkx
α+βj xγ
k ≤ (p− 2)∑
1≤γ<p
∑
j 6=k
η2jkx
p−γj xγ
k , (19)
where the second inequality is given by (15). Further, Ψ(α, β, γ) clearly is symmetric in the arguments,which gives the first inequality in the following chain (where X = Diagx1, ..., xn and ‖z‖F is theFrobenius norm of a matrix):
|I2| ≤ ∑
α+β+γ=p
1≤α≤β≥γ≥1
6| ∑j,k,ℓ
ζjkζkℓζℓjxαj x
p−2α2
k xp−2γ
2
k xγℓ |
= 6∑
1≤α,γ<p/2
|Tr(
[XαζXp−2α
2 ][Xp−2β
2 ζXγ ]ζ)
|
≤ 12∑
1≤α,γ<p/2
‖XαζXp−2α
2 ‖F︸ ︷︷ ︸
Sα
‖X p−2β2 ζXγ‖F [by (17)]
≤ 12∑
1≤α,γ<p/2
√Sα
√Sγ
[
Sα =∑
µ,νx2α
µ ζ2µνx
p−2αν ⇒ ∑
1≤α<p/2
Sα ≤ R]
= 12
(
∑
1≤α<p/2
S1/2α
)2
≤ 6p∑
1≤α<p/2
Sα ≤ 6pR.
(20)
Combining (19), (20), we arrive at (16.c).
30. Combining (16) with (14), we arrive at the desired inequality (13).The following statement is the matrix analogue of Corollary 2.1.
Proposition 4.2 Let f(x) = Det1/m(x), where x ∈ Sm++. Then f is concave on S
m++ and
x ≻ 0, x± h 0 ⇒ |D3f(x)[h, h, h]| ≤ −7D2f(x)[h, h]. (21)
14
Proof. Setting H(x) = ln Det(x), x ≻ 0, we have
f(x) = expH(x)/m,Df(x)[h] = f(x)(m−1DH(x)[h]) = f(x)(m−1Tr(x−1h)),
D2f(x)[h, h] = f(x)(m−1DH(x)[h])2 + f(x)(m−1D2H(x)[h, h])= f(x)
[(m−1Tr(x−1h))2 −m−1Tr(x−1hx−1h)
],
D3f(x)[h, h, h] = f(x)(m−1DH(x)[h])[(m−1Tr(x−1h))2 −m−1Tr(x−1hx−1h)
]
+f(x)[−2(m−1Tr(x−1h)(m−1Tr(x−1hx−1h) + 2m−1Tr(x−1hx−1hx−1h
]
Setting η = x−1/2hx−1/2 and denoting by λ(u) the vector of eigenvalues of u ∈ Sn, by Eg
the average of the coordinates of a vector g, and by [g]k, g being a vector, the vector withcoordinates gk
i , we get
Df(x)[h] = f(x)Eλ(η) = f(x)µ[µ := Eλ(η)]
D2f(x)[h, h] = f(x)[µ2 − E
[λ(x)]2
]= f(x)E
[σ]2
[σi := λi(η) − µ]
D3f(x)[h, h, h] = f(x)
[
µ3 − 3µE[λ(η)]2
+ 2E
[λ(η)]3
]
= f(x)[µ3 − 3µ[µ2 + E
[σ]2
+ 2E
µ3e+ 3µ2σ + 3µ[σ]2 + [σ]3
]
= f(x)[3µE
[σ]2
+ 2E
[σ]3
].
Under the premise in (21), we have ‖λ(η)‖∞ ≤ 1, whence |µ| ≤ 1 and ‖σ‖∞ ≤ 2, which, in viewof the above formulas for the derivatives of f , immediately implies the conclusion in (21).
4.1.2 The ‖ · ‖p-barrier
Now, we are ready to state and prove the main result for the matrix generalization of the p-normconstruction.
Theorem 4.1 Let K be a closed convex cone with a nonempty interior in Rn, F−(u) be a ϑ−-
self-concordant barrier for K, and let Ai : Rn → S
ni, i = 1, ...,m, be linear mappings suchthat
u ∈ intK ⇒ Ai(u) ≻ 0, i = 1, ...,m.
Let us setA(u) = DiagA1(u), ...,Am(u).
(i) Given integer p ≥ 2, consider the function
g(u) := (Tr([A(u)]p)1/p : intK → R.
This function is convex and O(1)p-compatible with its domain:
u ∈ intK, u± du ∈ K ⇒ |D3g(u)[du, du, du]| ≤ O(1)pD2g(u)[du, du], (22)
so that the functionΦ(t, u) = − ln(t− g(u)) +O(1)p2F−(u)
is a self-concordant barrier with the parameter
ϑ = 1 +O(1)p2ϑ−
15
for the coneK(p) = (t, u) ∈ R+ × K : g(u) ≤ t.
Moreover, with
θ =
(m∑
i=1
ni
)1/p
we haveMθ ⊆ K(p) ⊆ M1, (23)
where Mr is the cone
(t, u) : u ∈ K,Ai(u) r−1Ini , i = 1, ...,m.
(ii) Given integer p ≤ −2, consider the function
g(u) := (Tr([A(u)]p)1/p : intK → R.
This function is concave and O(1)|p|-compatible with its domain:
u ∈ intK, u± du ∈ K ⇒ |D3g(u)[du, du, du]| ≤ −O(1)|p|D2g(u)[du, du], (24)
so that the functionΦ(t, u) = − ln(g(u) − t) +O(1)p2F−(u)
is a self-concordant barrier with the parameter
ϑ = 1 +O(1)p2ϑ−
for the coneK(p) = (t, u) ∈ R+ × K : g(u) ≥ t.
Moreover, with
θ =
(m∑
i=1
ni
)1/|p|
we haveN1 ⊆ K(p) ⊆ Nθ, (25)
where Nr is the cone
(t, u) : u ∈ K,Ai(u) r−1Ini , i = 1, ...,m.
Proof. All we need is to prove (22) and (24); the statements on self-concordance of Φ(·) aredirect consequences of the former relations (see [20]), and inclusions (23), (25) are evident.
To prove (22) (the proof of (24) is similar), let N :=∑
ini and let f(y) := (Tr(yp))1/p :
SN++ → R. Assuming that u, du satisfy the premise in (22), let us set x := A(u), h := A(du).
Since A(·) is a linear mapping which maps intK into SN++, we have
x ≻ 0, x± h 0, (26)
whence, by Proposition 4.1,
|D3f(x)[h, h, h]| ≤ O(1)pD2f(x)[h, h].
Since Dκg(u)[du, ..., du] = Dκf(x)[h, ..., h], we see that the conclusion in (22) indeed is true.
16
4.2 The log-exp construction
Our log-exp construction for packing and covering LPs also generalize to the matrix case. Inwhat follows, we use the matrix exponential. That is, for x ∈ S
n,
expx :=∞∑
k=0
1
k!xk.
We first begin with the compatibility result.
4.2.1 Compatibility
Proposition 4.3 LetF (x) := Tr(expx), f(x) := lnF (x) [x ∈ S
n].
Then(LI x ≻ 0,−x h x) ⇒ |D3f(x)[h, h, h]| ≤ O(1)LD2f(x)[h, h]. (27)
Proof. 10. Let us compute the derivatives of f(x) assuming x ≻ 0.We have
Df(x)[h] = DF (x)[h]F (x) = Tr(expxh)
Tr(expx)
D2f(x)[h, h] = −(
DF (x)[h]F (x)
)2
+ D2F (x)[h,h]F (x)
D3f(x)[h, h, h] = 2(
DF (x)[h]F (x)
)3
− 3DF (x)[h]D2F (x)[h,h]F 2(x) + D3F (x)[h,h]
F (x) .
Let x ≻ 0, let ej be the orthonormal eigenbasis of x and let xj be the corresponding eigenvalues as
before. Also let hkj = eTj hek, and finally let ηkj = x
−1/2k x
−1/2j hij . We have, assuming all xj ’s are distinct:
DF (x)[h] = Tr(expxh) = 12πi
∮Tr((zI − x)−1h) expzdz =
∑
j
expxjhjj
=∑
j
η2jjxj expxj
D2F (x)[h, h] = 12πi
∮Tr((zI − x)−1h(zI − x)−1h) expzdz = 1
2πi
∮ ∑
j,k
h2jk expz
(z−xj)(z−xk)dz
=∑
j 6=k
h2jk
expxj−expxkxj−xk
+∑
j
h2jj expxj
=∞∑
p=1
∑
j 6=k
h2jk
1p!
∑
α+β=p−1
xαj x
βk +
∑
j
h2jj expxj
=∞∑
p=1
∑
j 6=k
∑
α+β=p+1α,β≥1
η2jk
1p!x
αj x
βk +
∑
j
η2jjx
2j expxj
D3F (x)[h, h, h] = 22πi
∮Tr((zI − x)−1h(zI − x)−1h(zI − x)−1h) expzdz
= 22πi
∮ ∑
j,k,ℓ
hjkhkℓhℓj expz(z−xj)(z−xk)(z−xl)
dz
= 2∑
j,k,ℓ
hjkhkℓhℓjΓ(xj , xk, xℓ)
where
Γ(a, b, c) =expa
(a− b)(a− c)+
expb(b − a)(b− c)
+expc
(c− a)(c− b)
for distinct a, b, c, andΓ(a, b, c) = lim
(a′,b′,c′)→(a,b,c)
a′ 6=b′ 6=c′ 6=a′
Γ(a′, b′, c′).
17
Assuming a, b, c distinct, we have
Γ(a, b, c) = expa(a−b)(a−c) + expb
(b−a)(b−c) + expc(c−a)(c−b) = expa−expc
(a−b)(a−c) + expb−expc(b−a)(b−c)
= 1a−b
[∞∑
p=1
1p!
∑
α+β=p−1
aαbβ −∞∑
p=1
1p!
∑
α+β=p−1
cαbβ
]
=∞∑
p=2
1p!
∑
α+β=p−1α≥1
bβ aα−cα
a−c =∞∑
p=2
1p!
∑
µ+ν+β=p−2
aµcνbβ .
The resulting representation is, of course, valid for all a, b, c. We therefore get
D3F (x)[h, h, h] = 2∑
j,k,ℓ
hjkhkℓhℓj
∞∑
p=2
1p!
∑
α+β+γ=p−2
xαj x
βkx
γℓ
= 2∞∑
p=2
∑
j,k,ℓ
∑
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p!x
αj x
βkx
γℓ .
We now have
Df(x)[h] =
P
j
expxjηjjxj
P
j
expxj=∑
j
pj(xjηjj) =: µ
[
pj :=expxj
P
k
expxk
]
,
D2f(x)[h, h] = −(
∑
j
pjxjηjj
)2
+
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k+
P
j
x2jη2
jj expxj
P
j
expxj
=∑
j
pjσ2j +
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
j
expxj[σj := xjηjj − µ]
and
D3f(x)[h, h, h]
= 2µ3 − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
j
expxj+∑
j
pjx2jη
2jj
+ D3F (x)[h,h,h]F
= 2µ3 − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
ν
expxν+ µ2 +
∑
j
pjσ2j
+
2∞P
p=2
P
j,k,ℓ
P
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p!x
αj xβ
kxγ
ℓ
P
ν
expxν
= −µ3 − 3µ∑
j
pjσ2j − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
ν
expxν+
2∞P
p=2
1p!
P
j
η3jj
P
α+β+γ=p+1α,β,γ≥1
xαj xβ
j xγj
P
ν
expxν
+
2∞P
p=2
P
j,k,ℓ(j 6=k)|(j 6=ℓ)
P
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p! x
αj xβ
kxγ
ℓ
P
ν
expxν.
18
Whence
D3f(x)[h, h, h]
= −µ3 − 3µ∑
j
pjσ2j − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
ν
expxν+
2∞P
p=2
1p!
P
j
η3jj xp+1
j
p(p−1)2
P
ν
expxν
+
2∞P
p=2
P
j,k,ℓ(j 6=k)|(j 6=ℓ)
P
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p!x
αj xβ
kxγ
ℓ
P
ν
expxν
= −µ3 − 3µ∑
j
pjσ2j − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p!x
αj xβ
k
P
ν
expxν+∑
j
pj η3jjx
3j
︸ ︷︷ ︸
(σj+µ)3
=∑
j
pjσ3j − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p! x
αj xβ
k
P
ν
expxν+
2∞P
p=2
P
j,k,ℓ(j 6=k)|(j 6=ℓ)
P
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p!x
αj xβ
kxγ
ℓ
P
ν
expxν
=∑
j
pjσ3j − 3µ
∞P
p=1
P
j 6=k
P
α+β=p+1α,β≥1
η2jk
1p! x
αj xβ
k
P
ν
expxν+
6∞P
p=2
P
j 6=ℓ
P
α+β=p+1α≥2,β≥1
ηjjη2jℓ
1p! (α−1)xα
j xβ
ℓ
P
ν
expxν
+
2∞P
p=2
P
j 6=k 6=ℓ 6=j
P
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p! x
αj xβ
kxγ
ℓ
P
ν
expxν.
Thus,
df := Df(x)[h] =∑
j
pj(xjηjj) = µ
[
pj =expxj
P
k
expxk
]
d2f := D2f(x)[h, h] =∑
j
pjσ2j
︸ ︷︷ ︸
R1
+
∞∑
p=1
∑
j 6=k
∑
α+β=p+1α,β≥1
η2jk
1p!x
αj x
βk
∑
j
expxj
︸ ︷︷ ︸
R2
[σj = xjηjj − µ]
d3f := D3f(x)[h, h, h] =∑
j
pjσ3j
︸ ︷︷ ︸
I0
−3µ
∞∑
p=1
∑
j 6=k
∑
α+β=p+1α,β≥1
η2jk
1p!x
αj x
βk
∑
νexpxν
︸ ︷︷ ︸
I1
+
6∞∑
p=2
∑
j 6=k
∑
α+β=p+1α≥2,β≥1
ηjjη2jk
1p! (α− 1)xα
j xβk
∑
νexpxν
︸ ︷︷ ︸
J0
+
2∞∑
p=2
∑
j 6=k 6=ℓ 6=j
∑
α+β+γ=p+1α,β,γ≥1
ηjkηkℓηℓj1p!x
αj x
βkx
γℓ
∑
νexpxν
︸ ︷︷ ︸
J1
.
(28)
The resulting formulas for the derivatives, although established under the assumption that all xj are
distinct, clearly remain valid for all x ≻ 0.
20. Now let x, h satisfy the premise of (27). Then
0 < xj ≤ L, j = 1, ..., n, & − I η I. (29)
Whence|µ| ≤ L, |σj | ≤ 2L. (30)
19
It follows that|I0| + |I1| ≤ 3Ld2f. (31)
Further, we have
|J0| ≤6
∞P
p=2
P
j 6=k
P
α+β=p+1α≥2,β≥1
|ηjj |η2jk
1p!
(α−1)xαj xβ
k
P
νexpxν
≤6
∞P
p=2
P
j 6=k
P
α+β=p+1α≥2,β≥1
η2jk
α−1p!
xαj xβ
k
P
νexpxν
[by (29)]
=
6∞P
q=1
P
j 6=k
P
α′+β=q+1α′≥1,β≥1
η2jkxj
α′
(q+1)!xα′
j xβk
P
νexpxν
[p = q + 1, α = α′ + 1]
≤ 6LR1 [due to 0 ≤ xj ≤ L, α′
q+1 ≤ 1].
(32)
Now, let ζ be the matrix obtained from η by replacing the diagonal entries with zeros. Then
−2I ζ 2I (33)
and
J1 =
2∞∑
p=2
1p!
Φ(α,β,γ)︷ ︸︸ ︷∑
α+β+γ=p+1α,β,γ≥1
∑
j,k,ℓ
ζjkζkℓζℓj1
p!xα
j xβkx
γℓ
∑
νexpxν
.
Φ(α, β, γ) clearly is symmetric in α, β, γ, which gives the first inequality in the following chain:
|J1| ≤12
∞P
p=2
1p!
P
α+β+γ=p+11≤α≤β≥γ≥1
˛
˛
˛
˛
˛
P
j,k,ℓζjkζkℓζℓj
1p!
xαj xβ
kxγℓ
˛
˛
˛
˛
˛
P
νexpxν
=
12∞P
p=2
1p!
P
α+β+γ=p+11≤α≤β≥γ≥1
|Tr(XαζXβζXγζ)|
P
νexpxν
[X = Diagx1, ..., xn]
≤
12∞P
p=2
1p!
P
α+γ≤2(p+1)
3
1≤α,γ<p+12
|Tr([XαζXp+1−2α
2 ][Xp+1−2γ
2 ζXγ ]ζ)|
P
νexpxν
≤
24∞P
p=2
1p!
P
α+γ≤2(p+1)
3
1≤α,γ<p+12
Sα︷ ︸︸ ︷
‖XαζXp+1−2α
2 ‖F ‖Xp+1−2γ
2 ζXγ‖F
P
νexpxν
[due to −2I ≤ ζ ≤ 2I].
We haveS2
α =∑
j 6=k
x2αj η2
jkxp+1−2αk .
20
Therefore ∑
1≤α< p+12
S2α ≤
∑
j 6=k
∑
µ+τ=p+1µ,τ≥1
η2jkx
µj x
τk;
whence
|J1| ≤24
∞P
p=2
1p!
P
1≤α,γ<p+12
SαSγ
P
νexpxν
=
24∞P
p=2
1p!
0
B
@
P
1≤α<p+12
Sα
1
C
A
2
P
νexpxν
≤24
∞P
p=2
pp!
P
1≤α<p+12
S2α
P
νexpxν
≤24
∞P
q=1
1q!
P
j 6=k
P
µ+τ=q+2µ,τ≥1
η2jkxµ
j xτk
P
νexpxν
.
Since 0 < xj ≤ L, we clearly have
∞∑
q=1
1
q!
∑
j 6=k
∑
µ+τ=q+2µ,τ≥1
η2jkx
µj x
τk ≤ 2L
∞∑
q=1
1
q!
∑
j 6=k
∑
α+β=q+1α,β≥1
η2jkx
αj x
βk ,
and we arrive at|J1| ≤ 48LR2. (34)
Combining (31), (32) and (34), we arrive at the relation
|d3f | ≤ O(1)Ld2f,
as claimed.
4.2.2 The log-exp barrier
As before, now that the compatibility result is established, we can state and prove the maintheorem for the log-exp construction.
Theorem 4.2 Let K be a closed convex cone with a nonempty interior in Rn, F−(u) be a ϑ−-
self-concordant barrier for K, and let Ai : Rn → S
ni, i = 1, ...,m, be linear mappings suchthat
u ∈ intK ⇒ Ai(u) ≻ 0, i = 1, ...,m.
Let us setA(u) = DiagA1(u), ...,Am(u).
Given L > lnN , where N =∑
ini, consider the function
g(u) := ln (Tr(expLA(u))) : intK → R.
This function is convex and satisfies the relation:
u ∈ intK, u± du ∈ K, g(u) ≤ L⇒ |D3g(u)[du, du, du]| ≤ O(1)LD2g(u)[du, du]. (35)
Consequently, the function
Φ(u) = − ln(L− g(u)) +O(1)L2F−(u)
21
is a self-concordant barrier with the parameter
ϑ = 1 +O(1)L2ϑ−
for the setK(L) := clu ∈ intK : g(u) ≤ L.
Moreover, when δ ∈ (0, 1) and L ≥ ln Nδ , we have
u ∈ K : Ai(u) (1 − δ)Ini , i = 1, ...,m ⊆ K(L) ⊆ u ∈ K : Ai(u) Ini , i = 1, ...,m. (36)
Proof. Same as in the proof of Proposition 2.1, all we need is to verify (35), which is immediate.Indeed, let u, du satisfy the premise in (35), and let x = LA(u), h = LA(du). Since A is a linearmapping which maps intK into int S
N+ , we have x ≻ 0 and x± h 0. Moreover, from g(u) ≤ L
it follows that x LIN . Setting f(y) = ln Tr(expy), we have g(v) = f(LA(v)), whenceDkg(u)[du, . . . , du] = Dkf(x)[h, ..., h], k = 0, 1, . . .. As we have seen, x, h satisfy the premise in(27); applying Proposition 4.3, we arrive at the conclusion in (35).
5 A Generalization to Convex Semi-infinite Programming
The reader must have recognized that there are certain uniform structures to the derivativesof the functions which we utilized in this paper. These structures seem critical in securingthe necessary inequalities in the barrier calculus of Nesterov and Nemirovski [20] and in turnobtaining the related barriers with the desired self-concordance properties. In this section,we show that many of these properties generalize to the case when our variable is infinitedimensional.
Let fα(x) > 0 for all x ∈ Rn++ and µ be a measure on the set of indices T . Let us define
Φ(x) :=
(∫
fpα(x)µ(dα)
)1/p
.
Then for all 0 6= p ∈ R, we have
DΦ(x)[h] = Φ(x)Ex Sx,h(·) ,where Sx,h(α) = Dfα(x)[h]
fα(x) , πx(α) = fpα(x)
R
fpβ (x)µ(dβ)
and Exg(·) =∫g(α)πx(α)µ(dα);
D2Φ(x)[h, h] = Φ(x)[
(p− 1)Ex
σ2x,h
+ Ex
D2fα(x)[h,h]
fα(x)
]
,
where σx,h(α) = Sx,h(α) − Ex Sx,h(·) ;
D3Φ(x)[h, h, h] = Φ(x)
[
Ex
D3fα(x)[h,h,h]
fα(x)
+(p− 1)
(p − 2)Ex
σ3x,h(·)
− 3Ex Sx,h(·)Ex
σ2x,h(·)
+ 3Ex
σx,h(·) · D2fα(x)[h,h]fα(x)
]
.
Proposition 5.1 Suppose that D2fα(x)[h, h] ≥ 0 for every x ∈ Rn++ and for every h ∈ R
n. Letξ1 and ξ2 be given such that
supx∈G;h:(x±h)≥0
|Sx,h| ≤ ξ1,
and
D3fα(x)[h, h, h] ≤ ξ2D2fα(x)[h, h], for every x ∈ R
n++, and h such that (x± h) ≥ 0.
22
Then, for every x ∈ Rn++ and every h ∈ R
n such that (x± h) ≥ 0, we have
∣∣D3Φ(x)[h, h, h]
∣∣ ≤ max (2p − 1)ξ1, 6(p − 1)ξ1 + ξ2D2Φ(x)[h, h].
Proof. We simply substitute the bounds given in the assumption part of the statement inthe expression for the third derivative (given immediately before the proposition) and the claimof the proposition easily follows.
6 Recovering a Good Dual Solution
In the previous sections we showed how to construct self-concordant barriers for the convexapproximations G of the convex set of main interest G. Once we have such a barrier, we canuse the general self-concordance theory and we immediately have various path-following andpotential-reduction interior-point algorithms to optimize a linear function over G.
If we can compute the Legendre-Fenchel conjugate of our barrier efficiently, then we can evenapply some primal-dual algorithms (as in [14]). However, if the Legendre-Fenchel conjugate isnot available for efficient computation, then we are stuck with primal-only algorithms. Even insuch a case, we would be interested in generating good dual solutions. This section is dedicatedto showing a way to recover a good dual solution from a good, central primal solution.
Cones Mp. For y ∈ Sm and p ∈ [1,∞], let |y|p := ‖λ(y)‖p. Further let K be a closed convex
cone with nonempty interior in an Euclidean space (E, 〈·, ·〉) and P be a linear mapping from Eto S
m such that Px ≻ 0 whenever x ∈ intK. Finally, we define
Mp := (t, x) : x ∈ K, |Px|p ≤ t .
We have (〈·, ·〉F stands for the Frobenius inner product on Sm):
M∗p = (τ, ξ) : x ∈ K, |Px|p ≤ t⇒ τt+ 〈x, ξ〉 ≥ 0
=
(τ, ξ) : ∃(φ ∈ K∗, η ∈ Sn, σ ∈ R, |η|q ≤ σ) :
〈φ, x〉 + σt− 〈Px, η〉F= tτ + 〈x, ξ〉,∀x, t
[1q + 1
p = 1]
= (τ, ξ) : ∃(η ∈ Sn, φ ∈ K∗) : ξ = φ− P∗η, |η|q ≤ τ .
Since P maps K into Sm+ , P∗ maps S
m+ into K∗; thus, whenever η′ η, we have (P∗η′ − P∗η) ∈
K∗. It follows that if ξ = φ−P∗η with φ ∈ K∗ and |η|p ≤ τ and η+ is the “positive part” of η,then ξ = (φ+ P∗η+ − P∗η)
︸ ︷︷ ︸
∈K∗
−P∗η+ and |η+| ≤ τ . We arrive at
M∗p = (τ, ξ) : ∃(φ ∈ K∗, η 0) : ξ = ψ − P∗η, |η|p ≤ τ .
Thus, a primal-dual pair of conic problems associated with Mp is
mins
eT s : As− b ∈ K, |P(As − b)|p ≤ cT s− d
(P )
mmaxφ,η,τ
〈φ− P∗η, b〉 + τd : A∗(φ− P∗η) + τc = e, φ ∈ K∗, |η|q ≤ τ, η 0 . (D)
23
Here, the data are given by (A, b, c, d, e), where e (no longer a vector of all ones) is arbitrary.Now, let p ≥ 2 be integer, F be a ϑ-logarithmically homogeneous s.c.b. for K, α ≥ 1 andβ = O(1)p2; then
Φp(t, x) = −α ln (t− |Px|p) + β2F (x)
is ϑ(p)-logarithmically homogeneous s.c.b. for Mp, with
ϑ(p) = α+ β2ϑ.
Let sρ be a central solution to (P ):
sρ = argmins
ρeT s+ Φp(c
T s− d,As − b)
⇓xρ := Asρ − b, tρ := cT sρ − d−, ζρ = Pxρ, ωρ = |ζρ|p, ξρ = [ω−1
ρ ζρ]p−1
⇓ρe− α
tρ−ωρ[c−A∗P∗ξρ] + β2A∗∇F (Asρ − b) = 0
⇓τρ := α
ρ(tρ−ωρ) , ηρ := τρξρ, φρ := −β2
ρ ∇F (Asρ − b), λ := λ(ξρ) [|ω−1ρ ζρ|p = 1 ⇒ ‖λ‖ p
p−1= 1]
⇓
A∗(φρ − P∗ηρ) + τρc = e, φρ ∈ K∗, ηρ 0,|ηρ|q = τρ‖λ‖ p
p−1= τρ
.
That is, a central solution sρ generates a feasible solution (φρ, ηρ, τρ) of (D). We have
〈φρ − P∗ηρ, b〉 + τρd =
=sTρ [A∗(φρ−P∗ηρ)+τρc]=sT
ρ e︷ ︸︸ ︷
〈φρ − P∗ηρ, Asρ〉 + τρcT sρ +〈φρ − P∗ηρ, b−Asρ〉 + τρ(d− cT sρ)
= eT sρ + 1ρ
[∇Φp(c
T sρ − d,Asρ − b)]T[cT sρ − dAsρ − b
]
= eT sρ − ϑ(p)ρ .
Example. Let Qi 0. Then
minλ
−eTλ : |∑iλiQi|p ≤ 1∀i, λ ≥ 0
[
A = Id, b = 0,K = Rn+, c = 0, d = −1,Pλ =
∑
iλiQi
] (P )
⇓
maxφ,τ,η
−τ :
φi − Tr(η,Qi) = −1∀iφ ≥ 0|η|q ≤ τ
(D)
m−min
η|η|q : Tr(ηQi) ≥ 1∀i
m−min
η|η|q : Tr(ηQi) ≥ 1∀i, η 0 .
24
Cones Np. Let K be a closed convex cone with a nonempty interior in Euclidean space(E, 〈·, ·〉) and P be a linear mapping from E to S
m such that Px ≻ 0 whenever x ∈ intK. Forpositive integer p, let
Np := cl
(t, x) : t > 0, x ∈ intK,(Tr([Px]−p)
)−1/p ≥ t
.
We have
N∗p =
(τ, ξ) : x ∈ intK, t > 0, |[Px]−1|p ≤ t−1 ⇒ τt+ 〈ξ, x〉 ≥ 0
=(τ, ξ) : x ∈ intK, t > 0, |t2[Px]−1|p ≤ t⇒ τt+ 〈ξ, x〉 ≥ 0
=
(τ, ξ) : minx,y,t
τt+ 〈ξ, x〉 : t2[Px]−1 y, |y|p ≤ t, x ∈ intK, t > 0
≥ 0
=
(τ, ξ) : minx,y,t
τt+ 〈ξ, x〉 :
[y tItI Px
]
0, |y|p ≤ t, t > 0, x ∈ intK
≥ 0
=
(τ, ξ) : minx,y,t
τt+ 〈ξ, x〉 :
[y tItI Px
]
0, |y|p ≤ t, x ∈ K
≥ 0
=
(τ, ξ) : ∃ (α, β, η, γ, σ, φ) :
[α βT
β η
]
0, |γ|q ≤ σ, φ ∈ K∗,
Tr(yα) + 2Tr(tβ) + Tr(ηPx) + tσ − Tr(yγ) + 〈φ, x〉= τt+ 〈ξ, x〉 ∀x, y, t
[1q + 1
p = 1]
=
(τ, ξ) : ∃ (η 0, φ ∈ K∗, σ) :
ξ = φ+ P∗η
|σ − τ | ≤ 2maxα,β
2Tr(β) :
[α βT
β η
]
0, |α|q ≤ σ
.
Let us solve the optimization problem
maxα,β
2Tr(β) :
[α βT
β η
]
0, |α|q ≤ σ
.
When solving the problem, we may assume without loss of generality that η is diagonal. In thiscase, the feasible solution set of the problem remains invariant under the mappings (α, β) 7→(GαG,GβG), where G is a diagonal matrix with diagonal entries ±1, and the objective functionalso remains invariant under these mappings. Since the problem is convex, it follows that theoptimal value remains unchanged when α, β are restricted to be invariant with respect to theabove transformations (that is, we can assume that α and β are diagonal). In this case, theproblem becomes
Opt = maxαi,βi
∑
i
βi : β2i ≤ αiηi,
∑
i
αp
p−1
i ≤ σp
p−1
,
where ηi are the eigenvalues of η. We have
Opt = maxαi
∑
iη
1/2i α
1/2i :
∑
iα
pp−1
i ≤ σp
p−1
= maxδi
δiη1/2i : ‖δ‖ 2p
p−1≤ σ1/2
=√σ
(∑
iη
12·
2p/(p−1)2p/(p−1)−1
i
)2p/(p−1)−12p/(p−1)
=√
σ|η| pp+1
,
25
where η ∈ Sm+ ; note that |η|r is concave in η ∈ S
m+ when 0 < r ≤ 1. Thus,
N∗p =
(τ, ξ) : ∃ (η 0, φ ∈ K∗, σ ≥ 0) : ξ = φ+ P∗η, |τ − σ| ≤ 2√
σ|η| pp+1
=
(τ, ξ) : ∃ (η 0, φ ∈ K∗) : ξ = φ+ P∗η, 0 ≤ τ + |η| pp+1
.
Thus, a primal-dual pair of conic problems associated with Np is
mins
eT s : As− b ∈ K, cT s− d ≥ 0, |[P(As − b)]−1|p ≥ 1cT s−d
(P )
mmaxφ,η,τ
〈φ+ P∗η, b〉 + τd : A∗(φ+ P∗η) + τc = e, φ ∈ K∗, 0 ≤ τ + |η| pp+1
, η 0
. (D)
Now, let F be a ϑ-logarithmically homogeneous s.c.b. for K, α, γ ≥ 1 and β = O(1)p2; then
Ψp(t, x) = −α ln(|[Px]−1|−1
p − t)− γ ln t+ β2F (x)
is ϑ(p)-logarithmically homogeneous s.c.b. for Np, with
ϑ(p) = α+ γ + β2ϑ.
Let sρ be a central solution to (P ):
sρ = argmins
ρeT s+ Ψp(c
T s− d,As − b)
⇓xρ := Asρ − b, tρ := cT sρ − d, ζρ = [Pxρ]
−1, ωρ = |ζρ|p, ξρ = [ω−1ρ ζρ]
p+1
⇓ρe− α
ω−1ρ −tρ
[A∗P∗ξρ − c] − βtρc+ β2A∗∇F (Asρ − b) = 0
⇓δρ := α
ρ(ω−1ρ −tρ)
, τρ := βρtρ
− δρ, ηρ := δρξρ,
φρ := −β2
ρ ∇F (Asρ − b), λ := λ(ξρ) [|ω−1ρ ζρ|p = 1 ⇒ ‖λ‖ p
p+1= 1]
⇓
A∗(φρ + P∗ηρ) + τρc = e, φ ∈ K∗, η 0|ηρ| p
p+1= δρ‖λ‖ p
p+1= δρ ≥ −τρ
.
So, a central solution sρ generates a feasible solution (φρ, ηρ, τρ) of (D). We have
〈φρ + P∗ηρ, b〉 + τρd =
=sTρ [A∗(φρ+P∗ηρ)+τρc]=sT
ρ e︷ ︸︸ ︷
〈φρ + P∗ηρ, Asρ〉 + τρcT sρ +〈φρ + P∗ηρ, b−Asρ〉 + τρ(d− cT sρ)
= eT sρ + 1ρ
[∇Ψp(c
T sρ − d,Asρ − b)]T[cT sρ − dAsρ − b
]
= eT sρ − ϑ(p)ρ .
7 Lipschitz Continuous Gradients
For the minimization of convex functions with Lipschitz continuous gradients, under certainfavorable circumstances, the first-order methods can achieve the further improved iteration
26
bound of O
(√Lǫ
)
(see [19], also see [13]). So, let us look at a p-norm type function applied to
the eigenvalues of a symmetric matrix. Let p ≥ 3, and let
H(x) = ‖λ+(x)‖2/pp , S(x) = ‖λ+(x)‖p
p,
where x is a symmetric matrix and λ+(x) is the vector with the entries max[0, λi(x)] and λi(x)are the eigenvalues of x.
Proposition 7.1 The function H(·) is convex and continuously differentiable with Lipschitzcontinuous gradient, specifically,
x, y ∈ Sn ⇒ |H ′(x) −H ′(y)|q ≤ 2(p− 1)|x − y|p.
Proof. It suffices to verify thatH is continuously differentiable, twice continuously differentiable(except at the origin) such that
x 6= 0 ⇒ 0 ≤ D2H(x)[h, h] ≤ 2p|h|2p ∀h. (37)
Let γ be a simple closed curve in the right half-plane which encircles [0, L]. For x ∈ Sn with
maxiλi(x) < L we have
S(x) =1
2πi
∮
γ
zpTr((zI − x)−1)dz
whence, as it is immediately seen, S is twice continuously differentiable everywhere, so thatH is twice continuously differentiable (except at the origin); moreover, H clearly is continu-ously differentiable. It is also well-known that H is convex (as a symmetric convex function ofeigenvalues of a symmetric matrix).
Let λi be the eigenvalues of x, I = i : λi > 0, J = i : λi ≤ 0, let h be a symmetricmatrix and hij be the entries of h in the orthonormal eigenbasis of x. Then
DS(x)[h] = 12πi
∮
γ
pzp−1Tr((zI − x)−1h)dz = p∑
i∈I
λp−1i hii
D2S(x)[h, h] = 12πi
∮
γ
pzp−1Tr((zI − x)−1h(zI − x)−1h)dz = 12πi
∮
γ
pzp−1
(
∑
i,j
h2ij
1(z−λi)(z−λj)
)
dz
= p∑
i,j∈I
h2ij
λp−1i −λp−1
j
λi−λj+ 2p
∑
i∈I,j∈J
h2ij
λp−1i
λi−λj
[
where ap−1−bp−1
a−b = (p− 1)ap−2 when a = b]
≤ p(p− 1)∑
i,j∈I
h2ij
λp−2i +λp−2
j
2 + 2p∑
i∈I,j∈J
h2ijλ
p−2i [due to θq−1
θ−1 ≤ q θq−1+12 for θ, q ≥ 1]
≤ p(p− 1)∑
i∈I
λp−2i
∑
j
η2ij
whence
DH(x)[h] = 2p(S(x))
2p−1DS(x)[h] = 2(S(x))
2p−1 ∑
i∈I
hiiλp−1i
D2H[x][h, h] = 2p
(2p − 1
)
(S(x))2p−2
(DS(x)[h])2 + 2p(S(x))
2p−1D2S(x)[h, h]
≤ 2p(S(x))
2p−1D2S(x)[h, h].
27
Since D2H(x)[h, h] is homogeneous of degree 0 with respect to x, we may assume when com-
puting D2H(x)[h, h] that S(x) = 1, that is,∑
i∈Iλp
i = 1. In this case, setting ηi =√∑
jh2
ij :
D2H(x)[h, h] ≤ 2pD
2S(x)[h, h] ≤ 2(p − 1)∑
i∈Iλp−2
i η2i ≤ 2(p − 1)
(∑
i∈Iλp
i
) p−2p(∑
i∈Iηp
i
) 2p
≤ 2(p − 1)‖h‖2p,
where the concluding inequality is due to the following observation:
For h ∈ Sn and p ≥ 2, let η be the vector with entries equal to the Euclidean lengths
of the columns in h. Then ‖η‖p ≤ |h|p.Indeed, setting h = vsvT , where v is orthogonal and s is diagonal with diagonalentries si, the Euclidean norms of the columns in h are the same as Euclidean normsof the columns in svT : η2
j =∑
is2i v
2ji. In other words, the vector [η]2 := (η2
1 , ..., η2n) is
obtained from the vector [s]2 = (s21, ..., s2n) by multiplication by a doubly-stochastic
matrix. It follows that ‖η‖2p = ‖[η]2‖p/2 ≤ ‖[s]2‖p/2 = ‖s‖2
p, as claimed.
We have demonstrated (37).
8 Conclusion and Future Work
There are three clear research directions motivated by this work:
1. Design and analysis of cutting-plane interior-point algorithms based on the self-concordantbarriers constructed here. One major advantage of our barriers over those used in the pre-existing work ([1, 2, 8, 12]) is that we do not need to drop any constraints and the additionof new constraints does not change the barrier parameter ϑ.
2. Further extension of the theory to constraints defined by other partial orders, cones (e.g.,partial orders induced by hyperbolic cones [22]).
3. Improvement of the computational complexity of evaluating f , f ′ and f ′′ for such self-concordant barriers.
References
[1] K. M. Anstreicher, Towards a practical volumetric cutting plane method for convex pro-gramming, SIAM J. Optim. 9 (1999) 190–206.
[2] D. S. Atkinson and P. M. Vaidya, A cutting plane algorithm for convex programming thatuses analytic centers, Math. Prog. 69 (1995) 1–43.
[3] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization, MPS-SIAMSeries on Optimization, SIAM, Philadelphia, 2001.
[4] D. Bienstock, Potential Function Methods for Approximately Solving Linear ProgrammingProblems. Theory and Practice, Kluwer Academic Publishers, Boston, USA, 2002.
28
[5] D. Bienstock and G. Iyengar, Approximating fractional packings and coverings in O(1/ǫ)iterations, SIAM J. Comput. 35 (2006) 825–854.
[6] G. Cornuejols, Combinatorial Optimization, Packing and Covering, CBMS-NSF Regionalconference series in applied mathematics, SIAM, 2001.
[7] N. Garg and J. Konemann, Faster and simpler algorithms for multicommodity flow andother fractional packing problems, Proc. 39th Ann. Symp. on FOCS (1998) 300–309.
[8] J.-L. Goffin, Z.-Q. Luo and Y. Ye, Complexity analysis of an interior cutting plane methodfor convex feasibility problems, SIAM J. Optim. 6 (1996) 638–652.
[9] M. D. Grigoriadis and L. G. Khachiyan, Coordination complexity of parallel price-directivedecomposition, Math. Oper. Res. 21 (1996) 321–340.
[10] P. Klein, S. A. Plotkin, C. Stein and E. Tardos, Faster approximation algorithms for theunit capacity concurrent flow problem with applications to routing and finding sparsecuts, SIAM J. Computing 23 (1994) 466–487.
[11] T. Leighton, F. Makedon, S. A. Plotkin, C. Stein, E. Tardos and S. Tragoudas, Fastapproximation algorithms for multicommodity flow problems, J. Comput. System Sciences50 (1995) 228–243.
[12] J. E. Mitchell and M. J. Todd, Solving combinatorial optimization problems using Kar-markar’s algorithm, Math. Prog. 56 (1992) 245–284.
[13] A. Nemirovski, Proximal method with rate of convergence O(1/t) for variational inequal-ities with Lipschitz continuous monotone operators and smooth convex-concave saddlepoint problems, SIAM J. Optim. 15 (2004) 229–251.
[14] A. Nemirovski and L. Tuncel, “Cone-free” primal-dual path-following and potential re-duction polynomial time interior-point methods, Math. Prog. 102 (2005) 261–294.
[15] A. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization,Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, Inc., New York,1983.
[16] Yu. Nesterov, Dual extrapolation and its applications for solving variational inequalitiesand related problems, Math. Prog., to appear.
[17] Yu. Nesterov, Rounding of convex sets and efficient gradient methods for linear program-ming problems, Core Discussion Paper 2004/4, Louvain-la-Neuve, Belgium, February2004.
[18] Yu. Nesterov, Unconstrained convex minimization in relative scale, Core Discussion Paper2003/96, Louvain-la-Neuve, Belgium, December 2003.
[19] Yu. Nesterov, Smooth minimization of nonsmooth functions, Math. Prog. 103 (2005) 127–152.
[20] Yu. Nesterov and A. Nemirovskii, Interior point polynomial methods in Convex Program-ming. - SIAM Series in Applied Mathematics, SIAM: Philadelphia, 1994.
29
[21] S. A. Plotkin, D. B. Shmoys and E. Tardos, Fast approximation algorithms for fractionalpacking and covering problems, Math. Oper. Res. 20 (1995) 257–301.
[22] J. Renegar, Hyperbolic programs, and their derivative relaxations, Found. Comput. Math.6 (2006) 59–79.
[23] F. Shahrokhi and D. W. Matula, The maximum concurrent flow problem, J. Assoc. Com-put. Mach. 37 (1990) 318–334.
30