Self-Concordant Barriers for Convex Approximations of ... · (where eis an all ones vector of...

Self-Concordant Barriers for Convex Approximations of

Structured Convex Sets

Levent Tuncel∗ Arkadi Nemirovski†

C & O Research Report: CORR 2007–03February 22, 2007

Abstract

We show how to approximate the feasible region of structured convex optimization prob-lems by a family of convex sets with explicitly given and efficient (if the accuracy of theapproximation is moderate) self-concordant barriers. This approach extends the reach ofthe modern theory of interior-point methods, and lays the foundation for new ways to treatstructured convex optimization problems with a very large number of constraints. Moreover,our approach provides a strong connection from the theory of self-concordant barriers to thecombinatorial optimization literature on solving packing and covering problems.

Keywords: convex optimization, self-concordant barriers, semidefinite programming, interior-point methods, packing-covering problems

AMS Subject Classification: 90C25, 90C51, 90C22, 90C05, 52A41, 49M37, 90C59, 90C06

∗([email protected]) Department of Combinatorics and Optimization, Faculty of Mathematics, Uni-versity of Waterloo, Waterloo, Ontario N2L 3G1, Canada. Research of this author was supported in part byDiscovery Grants from NSERC.

†([email protected]) School of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, Georgia, 30332-0205 USA. Part of this research was done while this author was an Adjunct Professor atthe Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo, supportedin part by a Discovery Grant from NSERC.

1

1 Introduction

In modern convex optimization, when we consider the polynomial-time algorithms, two familiesof algorithms stand out:

• interior-point methods,

• ellipsoid method and related first-order methods.

Let G ⊂ Rn be a closed convex set with nonempty interior. Let c ∈ R

n be given. Then to solvethe convex optimization problem

inf 〈c, x〉 : x ∈ G ,the modern interior-point methods usually require a computable self-concordant barrier functionf for G. Such functions (defined in the next section), completely describe the set G and itsboundary in a very special way. The ellipsoid and related first-order methods require an efficientseparation oracle for G. Such an oracle provides some local information about set G when it iscalled.

The ellipsoid method and the first-order methods require very little global information aboutG in each iteration. Another important difference between the two families of methods is thatinterior-point methods usually need f ′, f ′′ (the first and the second derivatives of the self-concordant barrier f) at every iteration and the elementary operations needed can be as badas Θ(n3) per iteration. In contrast, the ellipsoid method and the first-order methods can workwith O(n2) operations per iteration and sometimes, for structured problems, they require muchless than Θ(n2) work per iteration.

If we have a self-concordant barrier for G with barrier parameter ϑ, then O(√

ϑ ln(

1ǫ

))

iterations of interior-point methods will suffice to produce an ǫ-optimal solution (we assumed thatcertain scale factors for the input problem are bounded by O

(1ǫ

)). This bound is significantly

better than the best bounds of similar nature for the first-order algorithms. Moreover, wheneveran application requires a very high accuracy of the solution, the practical performance of theinterior-point algorithms are even much better than the corresponding first-order methods.

Nevertheless, an important advantage of the first-order methods can be observed at anotherextreme (low accuracy and extremely large dimension). The first-order methods can be used thesolve extremely large-scale problems (those for which performing even a single iteration of theinterior-point methods is out of reach of the current hardware/software combinations) as longas the required accuracy of the final solution is modest e.g., 10−2 or 10−4. Indeed, in many realapplications, it does not even make sense to ask for more accuracy than 10−2, due to the natureof the problem and the data collected as well as the final (practical) uses of the approximatesolutions.

Another important theoretical property of the ellipsoid method and the related first-ordermethods is that in some sense (in the black-box oracle model) for dimension-independent it-eration bounds, they are optimal. That is, the upperbound on the number of iterations of afirst-order method which only uses black-box subgradient information, O

(1ǫ2

)cannot be im-

proved ([15]). However, as Nesterov [19] recently showed, the utilization of certain knowledge ofthe structure of the convex optimization problem at hand does help improve this upperboundvery significantly to O

(1ǫ

)(also see [16], [13]).

In combinatorial optimization, one of the most interesting and quite general structures isdescribed by packing and covering problems; see the recent book by Cornuejols [6]. Given an

2

m-by-n matrix A with only 0,1 entries, and an objective function vector c, the combinatorialoptimization problem

max 〈c, x〉Ax ≤ ex ∈ 0, 1n,

(where e is an all ones vector of appropriate size) is a packing problem. The combinatorialoptimization problem

min 〈c, x〉Ax ≥ ex ∈ 0, 1n,

is a covering problem. Both theoretical and practical approaches for solving packing and coveringproblems usually involve their linear programming relaxations. We will mostly deal with suchproblems and their generalizations.

Let ai denote the ith row of A. Consider the function

H(x) := ln

(m∑

i=1

exp〈ai, x〉)

.

So-called the exponential “potential” function H(x) has been used in the context of approxi-mately solving special classes of linear optimization problems (mainly those arising from coveringand packing problems and minimum cost multi-commodity flow problems in combinatorial opti-mization), see [23, 10, 11, 21, 7, 4, 5]. In fact, such approach proved useful and interesting evenin the case of convex optimization problems with special convex functions and block diagonalstructure in the constraints [9]. However, this function is not a self-concordant barrier and tothe best of our knowledge, the role of self-concordant barriers in this context was previouslynon-existent.

Also, recently very good complexity bounds were obtained for the first-order algorithmsfor linear programming problems via approximations to the feasible regions which have certainsymmetry properties [17].

In contrast to the existing work above, we will show how to approximate the feasible regionof the structured optimization problem at hand by a family of convex sets for which we haveefficient (if the accuracy of approximation is not too high) self-concordant barriers.

We will construct self-concordant barriers with parameter ϑ for G (a convex approximationof G) either independent of or only logarithmically dependent on the larger of the dimensions ofthe problem (m), but strongly dependent on the goodness of our approximation to G.

Our work serves the following three purposes:

• We lay the foundation to bring the theory of modern interior-point methods (based on theself-concordance theory) closer to the ellipsoid method and the recently proposed first-ordermethods in terms of the worst-case iteration bounds that are independent of the larger ofthe problem dimensions, but dependent on the desired accuracy ǫ as a polynomial in 1

ǫ .

• From a technical point of view, we show a new way of dealing with exponentially manyterms in designing self-concordant barriers.

• We make a strong connection to the combinatorial optimization literature from the interior-point theory by providing new theoretical results for packing-covering LPs (and their vast,nonpolyhedral generalizations) based on self-concordant barriers.

3

A very important warning to the reader is about the computability of the barriers. Let m bethe larger of the dimensions in the problem data (and n is the smaller one). While the barrierparameters will grow at most with ln(m), if m is very very large, say m ≈ 2n, then evaluatingour barriers directly from the formulae we give can require Ω(m) (≈ 2n) work.

The paper is organized as follows: The next section deals with the LP relaxations of packingand covering problems. Two families of self-concordant barriers are derived, one is based on theexponential function and the other is based on the p-norm. Section 3 is very brief and simplypoints out that it is elementary to replace the nonnegative orthant by a closed convex cone.Section 4 generalizes the results much more significantly by replacing the linear functionals fromSection 2 with linear operators and by replacing the linear inequalities of Section 2 with thepartial orders induced by the cone of positive semidefinite matrices. In Section 5, we illustratethat some basic patterns of the first three derivatives of the functions we used can be extendedto a semi-infinite dimensional setting. The development up to Section 6 is not primal-dualsymmetric. So, in Section 6 we study duality in this context and based on a technique usedby the second author earlier, show how to generate a good dual feasible solution from a good,central primal solution. Finally, in Section 7 we conclude the technical results of the paper witha proposition showing that the square of the matrix p-norm function has Lipschitz continuousgradient with Lipschitz constant 2(p − 1). Such fact is useful in improving complexity boundsfor convex optimization problems involving such matrix functions.

2 Packing-Covering LPs

In this section, we start our study with the packing-covering LPs. First, we define the well-knownnotion of self-concordant barriers.

Definition 2.1 Let G ⊂ Rn be a closed convex set with nonempty interior. Then f : int (G) →

R is called a self-concordant barrier (s.c.b.) for G with barrier parameter ϑ if the followingconditions are satisfied:

• f ∈ C3, strictly convex on int (G);

• for every sequencex(k)

⊂ int (G) such that x(k) → x ∈ ∂G, f

(x(k)

)→ +∞;

•∣∣D3f(x)[h, h, h]

∣∣ ≤ 2

[D2f(x)[h, h]

]3/2, ∀x ∈ int (G), ∀h ∈ R

n,

• (Df(x)[h])2 ≤ ϑD2f(x)[h, h], for every x ∈ int (G), h ∈ Rn.

Suppose G is a closed convex cone with nonempty interior. Then a s.c.b. f for G withbarrier parameter ϑ is called logarithmically homogeneous if

f(tx) = f(x) − ϑ ln t, ∀x ∈ int (G),∀t > 0.

Consider the LP relaxation of a covering problem. Then the constraints Ax ≤ e are satisfiediff maxi 〈ai, x〉 ≤ 1. Roughly stated, we consider two approximations to the latter condition:

1. Log-exp construction:

m∑

i=1

expL〈ai, x〉 ≤ expL, for large L;

4

2. ‖ · ‖p-construction:(

m∑

i=1

〈ai, x〉p)1/p

≤ 1, for large p.

2.1 The log-exp construction

We begin with the results of approximating the constraint maxi 〈ai, x〉 ≤ 1 by∑

i expL〈ai, x〉 ≤expL, for large L.

Proposition 2.1 Let ai ∈ Rn+\0, i = 1, ...,m, and let

G(L) := x ∈ Rn+ :∑

iexpL〈ai, x〉 ≤ expL, [L > max[lnm, 3

2 ]];

G(s) := x ∈ Rn+ : 〈ai, x〉 ≤ s, i = 1, ...,m, [s > 0].

Then

G

(

1 − lnm

L

)

⊆ G(L) ⊆ G(1) (1)

and the function

FL(x) := − ln

(

L− ln

(∑

i

expL〈ai, x〉))

−(

2L

3

)2

F−(x), (2)

where

F−(x) :=

n∑

j=1

lnxj

is a ϑ(L)-self-concordant barrier for G(L), with

ϑ(L) := 1 +

(2L

3

)2

n. (3)

Proof. 10. We clearly have

∑

i

expL〈ai, x〉 ≤ expL ⇒ expL〈ai, x〉 ≤ expL∀i⇒ 〈ai, x〉 ≤ 1∀i,

whence G(L) ⊆ G(1). On the other hand, if x ∈ G(s),∑

iexpL〈ai, x〉 ≤ m expLs, so that

s ≤ 1 − ln mL ⇒ x ∈ G(L). (1) is proved.

20. Let

H(x) := ln

(∑

i

exp〈bi, x〉)

, bi := Lai.

Then

DH(x)[h] =

P

iexp〈bi,x〉〈bi,h〉P

iexp〈bi,x〉

=∑

ipi〈bi, h〉

[

pi := exp〈bi,x〉P

jexp〈bj ,x〉 ,

∑

ipi = 1

] (4)

5

whence

D2H(x)[h, h]

= −

„

P

iexp〈bi,x〉〈bi,h〉

«2

„

P

iexp〈bi,x〉

«2 +

P


2

P

iexp〈bi,x〉

= −(∑

ipi〈bi, h〉

)2

+∑

ipi〈bi, h〉2 =

∑

ipis

2i

[

si := 〈bi, h〉 − µ, µ :=∑

jpj〈bj , h〉,

∑

ipisi = 0

]

(5)and finally

D3H(x)[h, h, h]

= 2

„

P


«3

„

P

iexp〈bi,x〉

«3 − 3

„

P


«„

P


2

«

„

P

iexp〈bi,x〉

«2 +

P


3

P

iexp〈bi,x〉

= 2

(∑

ipi(si + µ)

)3

− 3

(∑

ipi(si + µ)

)(∑

ipi(si + µ)2

)

+∑

ipi(si + µ)3

= 2µ3 − 3µ(µ2 +∑

ipis

2i ) +

∑

ipi(s

3i + 3s2iµ+ 3siµ

2 + µ3)

= 2µ3 − 3µ3 − 3µ∑

ipis

2i +

∑

ipis

3i + 3µ

∑

ipis

2i + µ3 =

∑

ipis

3i .

(6)

We arrive at the following observation:

Lemma 2.1 . Let x ∈ G(L)⋂

int Rn+ and let h be such that x± h ∈ R

n+. Then

|D3H(x)[h, h, h]| ≤ 2LD2H(x)[h, h]. (7)

Proof. Indeed, if x ± h ∈ Rn+, then |hj | ≤ xj , j = 1, ..., n, whence |〈bi, h〉| ≤ 〈bi, x〉 ≤ L (recall

that ai ≥ 0 and x ∈ G(L) ⊆ G(1)). It follows that in the notation of (4) – (6) we have |µ| ≤ L,whence |si| ≤ 2L. With this in mind, (7) follows from the concluding relations in (5) and (6).

Now let us use the following result from [20]:

Lemma 2.2 Let

• G+ ⊂ RN be a closed convex domain, F+ be a ϑ+-self-concordant barrier for G+ and

K be the recessive cone of G+;

• G− be a closed convex domain in Rn and F− be a ϑ−-self-concordant barrier for G−;

• A : intG− → RN be a C3 mapping such that D2A(x)[h, h] ∈ −K for all x ∈ intG−

and

∀(x ∈ intG−,A(x) ∈ intG+)∀(h, x± h ∈ G−) : D3A(x)[h, h, h] ≤K −3βD2A(x)[h, h];(8)

• the set Go := x ∈ intG− : A(x) ∈ intG+ be nonempty.

Then Go is an open convex domain, and the function

F+(A(x)) + max[1, β2]F−(x)

is a self-concordant barrier for clGo with the parameter

ϑ := ϑ+ + max[1, β2]ϑ−.

Note: In [20], relation (8) is assumed to be valid for all x ∈ intG−. However, theproof presented in [20] in fact requires only a weaker form of the assumption givenby (8).

6

Now let us specialize the data in the statement of Lemma 2.2 as follows:

• G+ := t ≥ 0 ⊂ R, F+(t) := − ln(t) (ϑ+ = 1, K := R+);

• G− := Rn+, F−(x) := −

n∑

j=1lnxj (ϑ− = n);

• A(x) := L− ln

(∑

iexpL〈ai, x〉

)

.

When L > lnm, this data clearly satisfies all of the requirements from the premise of Lemma2.2, except for (8); by Lemma 2.1, the latter requirement also is satisfied with β = 2L

3 . ApplyingLemma 2.2, we arrive at the desired result.

2.2 The ‖ · ‖p-construction

Now, we consider approximating the constraint maxi 〈ai, x〉 ≤ 1 via a p-norm function.

Proposition 2.2 Let ai ∈ Rn+\0, i = 1, ...,m. For p 6= 0 let

H(x) :=

(m∑

i=1

〈ai, x〉p)1/p

,

and let

K(p) :=

(x, t) ∈ R

n+ × R : t ≥ H(x), p ≥ 1

(x, t) ∈ Rn+ × R : t ≤ H(x), p ≤ 1, p 6= 0.

Then

p ≥ 1 ⇒

(x, t) ∈ Rn+ × R : max

i〈ai, x〉 ≤ t

m1/p

⊆ K(p) ⊆

(x, t) ∈ Rn+ × R : max

i〈ai, x〉 ≤ t

,

p < 0 ⇒

(x, t) ∈ Rn+ × R : min

i〈ai, x〉 ≥ t

m1/p

⊆ K(p) ⊆

(x, t) ∈ Rn+ × R : min

i〈ai, x〉 ≥ t

(9)and the function

Fp(x) := −(

2|p − 2| + 3

3

)2

F−(x)−

ln (t−H(x)) , p ≥ 1ln (H(x) − t) , p ≤ 1, p 6= 0

, F−(x) :=

n∑

j=1

lnxj, (10)

is a ϑp-logarithmically homogeneous self-concordant barrier for K(p), with

ϑp := 1 +

(2|p − 2| + 3

3

)2

n. (11)

Proof We proceed as in the proof of Proposition 2.1. As in the latter proof, the only facts tobe verified are that

(a) H is convex on int Rn+ when p ≥ 1 and is concave on int R

n+ when p ≤ 1,p 6= 0;

(b) whenever x ∈ int Rn+ and h is such that x± h ∈ R

n+, we have

|D3H(x)[h, h, h]| ≤ (2|p − 2| + 3) |D2H(x)[h, h]|. (12)

7

Assume that p 6= 0. Given x ∈ int Rn+, ai ∈ R

n+\0, h ∈ R

n such that x± h ∈ Rn+, let us set

pi(x) :=〈ai, x〉p∑

j〈aj , x〉p

, δi(x) :=〈ai, h〉〈ai, x〉

, µ(x) :=∑

i

pi(x)δi(x), si(x) := δi(x) − µ(x).

We have

DH(x)[h] = H(x)µ(x);D2H(x)[h, h] = H(x)

[µ2(x) +Dµ(x)[h]

]

= H(x)

[

µ2(x) +

[∑

i(p − 1)pi(x)δ

2i (x) − pµ2(x)

]]

= H(x)

[

µ2(x) + (p − 1)∑

ipi(x)s

2i (x) + (p− 1)µ2(x) − pµ2(x)

]

= (p− 1)H(x)∑

ipi(x)s

2i (x);

D3H(x)[h, h, h] = H(x)[µ3(x) + µ(x)Dµ(x)[h] + 2µ(x)Dµ(x)[h] +D2µ(x)[h, h]

]

= H(x)[µ3(x) + 3µ(x)Dµ(x)[h] +D2µ(x)[h, h]

]

= H(x)

[

µ3(x) + 3µ(x)Dµ(x)[h] − 2pµ(x)Dµ(x)[h]

−2(p− 1)∑

ipi(x)δ

3i (x) + p(p− 1)

∑

ipi(x)δ

3i (x) − p(p− 1)

(∑

ipi(x)δ

2i

)

µ(x)

]

= H(x)

[

µ3(x) + (3 − 2p)µ(x)[(p − 1)∑

ipi(x)δ

2i (x) − pµ2(x)]

+(p− 1)(p − 2)∑

ipi(x)δ

3i (x) − p(p− 1)µ(x)

∑

ipi(x)δ

2i (x)

]

= H(x)

[

[1 − p(3 − 2p)]µ3(x) + (p− 1)(3 − 3p)µ(x)

[∑

ipi(x)s

2i (x) + µ2(x)

]

+(p− 1)(p − 2)∑

ipi(x)[s

3i (x) + 3s2i (x)µ(x) + 3si(x)µ

2(x) + µ3(x)]

]

= H(x)

[

(p − 1)(p − 2)∑

ipi(x)s

3i (x) − 3(p − 1)µ(x)

∑

ipi(x)s

2i (x)

]

.

Since ai ∈ Rn+, x ∈ int R

n+ and x±h ∈ R

n+, we have |δi(x)| ≤ 1, whence |µ(x)| ≤ 1 and |si(x)| ≤ 2.

We see that H is convex when p ≥ 1, H is concave when p ≤ 1, p 6= 0 and (12) takes place.

Corollary 2.1 For i = 1, ...,m, let ai ∈ Rn+\0 and πi > 0 be such that

∑

iπi = 1. Then

the concave function f(x, t) = 〈a1, x〉π1 ...〈am, x〉πm − t considered as a mapping from intG− :=int R

n+ × R to R, satisfies (8) with G+ := R+ and β = 7

3 :

x > 0, x± h ≥ 0 ⇒ |D3f(x)[h, h, h]| ≤ −7D2f(x)[h, h].

Proof. Indeed, we have seen that if 0 < p < 1 then Hp(x) =

(∑

i〈ai, x〉p

)1/p

satisfies the

relationx > 0, x± h ≥ 0 ⇒ |D3Hp(x)[h, h, h]| ≤ −(2|p− 2| + 3)D2Hp(x)[h, h].

It remains to note that as p→ 0+, the functionsHp(x)

m1/p converge uniformly along with derivatives

on every compact subset of int Rn+ to the function g(x) =

(m∏

i=1〈ai, x〉

)1/m

. It follows that the

8

desired inequality is valid when π1 = · · · = πm = 1m . This fact in turn implies that the desired

relation is valid when all πi are rational, which in turn implies the validity of the statement forall πi > 0 with the unit sum.

As a direct consequence of the last corollary, we also have the following fact.

Corollary 2.2 The function

F (x) := − ln

(m∏

i=1

〈ai, x〉πi − t

)

−(

7

3

)2 n∑

j=1

lnxj

is an O(1)n-self-concordant barrier for the convex set

(t, x) ∈ R+ × Rn+ :

m∏

i=1

〈ai, x〉πi ≥ t

.

In many problems from combinatorial optimization, we are usually interested in computingthe maximum (or minimum) cardinality sets satisfying a certain criteria. E.g., maximum car-dinality stable set, maximum cardinality matching, minimum cardinality node cover, minimumnumber of colors needed to color a graph. In these cases, a 1

n -approximation to the underlyingpolytope usually yields an exact algorithm to compute that maximum or minimum value. First,we work with the desired tolerance ǫ, then we will substitute ǫ = 1

n . Thus, in our notationabove, we need

1 − ln(m)

L≥ 1 − ǫ

which is equivalent to

L ≥ ln(m)

ǫ.

Therefore, the self-concordance parameter of FL is

1 +4n ln2(m)

9ǫ2

and in the case of ǫ = 1/n, we arrive at ϑ(L) = O(n5). This would imply an iteration bound of

O(n2.5 lnn

).

One distinct advantage of these new self-concordant barriers is that their barrier parametercan be kept fixed as we add cutting-planes. Let us fix the desired tolerance ǫ. In many cuttingplane schemes that admit polynomial or pseudo-polynomial complexity analyses, one can boundfrom above the number of cutting planes that will be generated by the cutting plane scheme. Letus suppose this upperbound is m and that m bounded by a polynomial function of the input.When we construct our barrier FL (or Fp), we can compute L (or p) using the upper bound m.Then in a cutting-plane scheme, as we add new constraints, the barrier parameter stays fixed atϑL (or ϑp). This can be a significant advantage at least in theoretical work on such algorithms.

In what follows, we call Rn+ the primary domain, and name the set

x ∈ Rn : 〈ai, x〉 ≤ 1, i = 1, 2, . . . ,m

the secondary domain.Next, we show that the above approach can be widely generalized in terms of each of these

domains.

9

3 Generalization of the Primary Domain

From the outlined proofs it is clear that the above results remain valid when

• the primary domain (nonnegative orthant Rn+) is replaced by an arbitrary closed convex

pointed cone K with nonempty interior,

• assumption ai ∈ Rn+\0 is replaced by ai ∈ K∗\0, where K∗ is the cone dual to K,

K∗ := s ∈ Rn : 〈x, s〉 ≥ 0, ∀x ∈ K ;

• the barrier F−(x) = −∑j

lnxj for Rn+ is replaced by a ϑ−-self-concordant barrier F for K,

and the factor n in (3), (11) is replaced by ϑ−.

4 Generalization of the Secondary Domain

In addition to the above generalization of the primary domain, we can also generalize each con-straint of the secondary domain. For example, for each i, we can replace 〈ai, x〉 ≤ 1 by Ai(x) I,where Ai : R

n → Sni a linear map (from R

n to the space of ni-by-ni symmetric matrices withreal entries) and “” is the partial order induced by the cone of positive semidefinite matrices,S

ni+ , in S

ni . So, Ai(x) I means [I −Ai(x)] ∈ Sni+ . The results of Sections 2 and 3 are included

as the special case ni = 1 for every i = 1, 2, . . . ,m.We first present the generalization of the p-norm construction.

4.1 The ‖ · ‖p-construction

4.1.1 Compatibility

Let Tr : Sn → R denote the trace and S

n++ denote the interior of S

n+. We sometimes write x ≻ 0

to mean that x is a symmetric positive definite matrix. We start by establishing the followingfact:

Proposition 4.1 Let p, |p| ≥ 2, be integer. Consider the following functions of x ∈ Sn++:

F (x) := Tr(xp), f(x) := (F (x))1/p.

Then f is convex when p ≥ 2, f is concave when p ≤ −2, and

x ≻ 0, x± h 0 ⇒ |D3f(x)[h, h, h]| ≤ O(1)|p||D2f(x)[h, h]| (13)

(from now on, all O(1)’s are positive absolute constants).

10

Proof. 10. Let us compute the derivatives of F and f . We have

Df(x)[h]

= 1p (F (x))1/p DF (x)[h]

F (x) = f(x)DF (x)[h]pF (x)

D2f(x)[h, h]

= f(x)(

DF (x)[h]pF (x)

)2

+ f(x)D2F (x)[h,h]pF (x) − f(x) (DF (x)[h])2

pF 2(x)

= f(x)

[

(1 − p)(

DF (x)[h]pF (x)

)2

+ D2F (x)[h,h]pF (x)

]

D3f(x)[h, h, h]

= f(x)

[

(1 − p)(

DF (x)[h]pF (x)

)3

+ DF (x)[h]D2F (x)[h,h]p2F 2(x)

]

+f(x)[

2(1 − p)DF (x)[h]p2F (x)

[D2F (x)[h,h]

F (x) − (DF (x)[h])2

F 2(x)

]

− DF (x)[h]D2F (x)[h,h]pF 2(x) + D3F (x)[h,h,h]

pF (x)

]

= f(x)

[

(2p− 1)(p− 1)(

DF (x)[h]pF (x)

)3

− 3(p− 1)DF (x)[h]D2F (x)[h,h]p2F 2(x) + D3F (x)[h,h,h]

pF (x)

]

.

Let ej be the orthonormal eigenbasis of x and let xj denote the corresponding eigenvalues of x (i.e.,

xej = xjej). Further let hkj = eTj hek, ηkj = x

−1/2k x

−1/2j hkj . We have, assuming all xj ’s distinct:

DF (x)[h] = pTr(xp−1h) = p∑

j

xp−1jj hjj = 1

2πi

∮pTr((zI − x)−1h)zp−1dz

D2F (x)[h, h] = 12πi

∮pTr((zI − x)−1h(zI − x)−1h)zp−1dz = 1

2πi

∮p∑

j,k

h2jkzp−1

(z−xj)(z−xk)dz

=∑

j 6=k

h2jkp

xp−1j −xp−1

k

xj−xk+∑

j

p(p− 1)h2jjx

p−2j

D3F (x)[h, h, h] = 22πi

∮pTr((zI − x)−1h(zI − x)−1h(zI − x)−1h)zp−1dz

= 22πi

∮p∑

j,k,ℓ

hjkhkℓhℓjzp−1

(z−xj)(z−xk)(z−xl)dz = 2p

∑

j,k,ℓ

hjkhkℓhℓjΓ(xj , xk, xℓ)

where, for distinct a, b, c,

Γ(a, b, c) =ap−1

(a− b)(a− c)+

bp−1

(b− a)(b − c)+

cp−1

(c− a)(c− b),

andΓ(a, b, c) = lim

(a′,b′,c′)→(a,b,c)

a′ 6=b′ 6=c′ 6=a′

Γ(a′, b′, c′).

Let q = |p|, and let k 6= j. When p ≥ 2, we have xjxkxp−1

j −xp−1k

xj−xk=

∑

α+β=qα,β≥1

xαj x

βk . When p ≤ −2, we have

xp−1j −xp−1

k

xj−xk=

(1/xj)q+1−(1/xk)q+1

xjxk((1/xk)−(1/xj))= −x−1

j x−1k

∑

α+β=q

x−αj x−β

k , that is, xjxkxp−1

j −xp−1k

xj−xk= − ∑

α+β=q

x−αj x−β

k .

It follows that

f−1(x)Df(x)[h] =

∑

j

xpjηjj

∑

j

xpj

=∑

j

pjηjj =: µ;

[

pj :=xp

jP

k

xp

k

]

further, in the case of p ≥ 2 we have

f−1(x)D2f(x)[h, h] = (1 − p)µ2 + p−1D2F (x)[h,h]F (x) = (1 − p)µ2 +

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k+

P

j

(p−1)η2jjxp

j

P

ℓ

xp

ℓ

= (p− 1)

[

∑

j

pjη2jj − µ2

]

+

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

ℓ

xp

ℓ

= (p− 1)∑

j

pjδ2jj +

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

ℓ

xp

ℓ

[δjj := ηjj − µ]

11

while in the case of p ≤ −2 we have

f−1(x)D2f(x)[h, h] = (1 − p)µ2 + p−1D2F (x)[h,h]F (x) = (1 − p)µ2 +

−P

j 6=k

η2jk

P

α+β=|p|

x−αj x−β

k+

P

j

(p−1)η2jjxp

j

P

ℓ

xp

ℓ

= (p− 1)

[

∑

j

pjη2jj − µ2

]

−P

j 6=k

η2jk

P

α+β=|p|

x−αj x−β

k

P

ℓ

xp

ℓ

= (p− 1)∑

j

pjδ2jj −

P

j 6=k

η2jk

P

α+β=|p|

x−αj x−β

k

P

ℓ

xp

ℓ

.

Finally, in the case of p ≥ 2 we have

f−1(x)D3f(x)[h, h, h] = (2p− 1)(p− 1)µ3 − 3(p− 1)µ

[

(p− 1)∑

j

pjη2jj +

P

j 6=k

η2jkxjxk

P

α+β=p−2

xαj xβ

k

P

ℓ

xp

ℓ

]

+2

P

j,k,ℓ

xjxkxℓηjkηkℓηℓjΓ(xj ,xk,xℓ)

P

ν

xpν

and for distinct positive a, b, c the following relations hold:

Γ(a, b, c) = ap−1

(a−b)(a−c) + bp−1

(b−c)(b−a) + cp−1

(c−a)(c−b) = ap−1−cp−1

(a−b)(a−c) + bp−1−cp−1

(b−c)(b−a)

=∑

α+β=p−2

[aαcβ

a−b + bαcβ

b−a

]

=∑

0≤β≤p−2

cβ∑

0≤α≤p−2−β

aα−bα

a−b

=∑

α+β+γ=p−3

aαbβcγ .

This concluding identity clearly remains valid when not all a, b, c are distinct. Thus, in the case of p ≥ 2we have

f−1(x)D3f(x)[h, h, h]

= (2p− 1)(p− 1)µ3 − 3(p− 1)µ

[

(p− 1)µ2 + (p− 1)∑

j

pjδ2jj +

P

j 6=k

η2jkxjxk

P

α+β=p−2

xαj xβ

k

P

ℓ

xp

ℓ

]

+2

P

j,k,ℓ

xjxkxℓηjkηkℓηℓj

P

α+β+γ=p−3

xαj xβ

kxγ

ℓ

P

ν

xpν

= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑

j

pjδ2jj + 2

∑

j

pjη3jj

∑

α+β+γ=p−3

1 − 3(p− 1)µ

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

ℓ

xp

ℓ

+

2P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj

P

α+β+γ=pα,β,γ≥1

xαj xβ

kxγ

ℓ

P

ν

xpν

= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑

j

pjδ2jj + (p− 1)(p− 2)

∑

j

pjη3jj − 3(p− 1)µ

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

ℓ

xp

ℓ

+

2P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj

P


xαj xβ

kxγ

ℓ

P

ν

xpν

= [−p2 + 3p− 2]µ3 − 3(p− 1)2µ∑

j

pjδ2jj + (p− 1)(p− 2)

[

∑

j

pj [µ3 + 3µ2δjj + 3µδ2jj + δ3jj ]

]

−3(p− 1)µ

P

j 6=k

η2jkxjxk

P

α+β=p−2

xαj xβ

k

P

ℓ

xp

ℓ

+2

P

(j 6=k)|(j 6=ℓ)

xjxkxℓηjkηkℓηℓj

P

α+β+γ=p−3

xαj xβ

kxγ

ℓ

P

ν

xpν

= −3(p− 1)µ∑

j

pjδ2jj + (p− 1)(p− 2)

∑

j

pjδ3jj − 3(p− 1)µ

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

ℓ

xp

ℓ

+

2P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj

P


xαj xβ

kxγ

ℓ

P

ν

xpν

.

12

In the case of p ≤ −2 we get Γ(a, b, c) = 1abc

∑

α+β+γ=|p|

a−αb−βc−γ , and the resulting formula for

f−1(x)D3f(x)[h, h, h] becomes

f−1(x)D3f(x)[h, h, h] = −3(p− 1)µ∑

j

pjδ2jj + (p− 1)(p− 2)

∑

j

pjδ3jj + 3(p− 1)µ

P

j 6=k

η2jk

P

α+β=|p|

x−αj x−β

k

P

ℓ

xp

ℓ

+2

P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj

P

α+β+γ=|p|

x−αj x−β

kx−γ

ℓ

P

ν

xpν

.

The resulting formulas, obtained for the case when all xj are distinct, clearly remain valid for all x ≻ 0.

Thus, for all x ≻ 0 and all h we have, setting f = f(x), df = Df(x)[h], d2f = D2f(x)[h, h],d3f = D3f(x)[h, h, h]:

f−1df =∑

jpjηjj =: µ

[

pj =xp

jP

kxp

k

]

p ≥ 2 :

f−1d2f = (p− 1)∑

jpjδ

2jj +

P

j 6=kη2

jk

P

α+β=pα,β≥1

xαj xβ

k

P

νxp

ν[δjj = ηjj − µ]

p ≤ −2 :

f−1d2f = (p− 1)∑

jpjδ

2jj −

P

j 6=k

η2jk

P

α+β=|p|

x−αj x−β

k

P

νxp

ν

p ≥ 2 :

f−1d3f = −3(p − 1)µ∑

jpjδ

2jj + (p− 1)(p − 2)

∑

jpjδ

3jj

−3(p − 1)µ

P

j 6=k

η2jk

P

α+β=pα,β≥1

xαj xβ

k

P

νxp

ν+ 2

P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓjP


xαj xβ

kxγℓ

P

νxp

ν

p ≤ −2 :

f−1d3f = −3(p − 1)µ∑

jpjδ

2jj + (p− 1)(p − 2)

∑

jpjδ

3jj

+3(p − 1)µ

P

j 6=kη2

jk

P

α+β=|p|

x−αj x−β

k

P

νxp

ν+ 2

P

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓjP

α+β+γ=|p|

x−αj x−β

k x−γℓ

P

νxp

ν,

(14)

where xj are the eigenvalues of x, ηij = x−1/2i (eTi hej)x

−1/2j and ei form an orthonormal eigenbasis

of x. Note that under the premise of (13) we have

−I η I. (15)

20. In the sequel, we focus on the case of p ≥ 2. The reasoning in the case of p ≤ −2 issimilar.

We have the following

13

Lemma 4.1 Suppose x ≻ 0 and that (15) holds. Then

(a) |δjj| ≤ 2(b) |µ| ≤ 1

(c) | ∑

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj∑


xαj x

βkx

γℓ | ≤ O(1)p

∑

j 6=k

η2jk

∑

α+β=pα,β≥1

xαj

︸︷︷︸

R

xβk . (16)

Proof. By (15), we have |ηjj | ≤ 1. Since µ is a convex combination of ηjj and δjj = ηjj − µ, (a), (b)follow.

Let ζ be the matrix obtained from η by replacing the diagonal entries with 0. By (15), we have

−2I ζ 2I. (17)

We now have

∑

(j 6=k)|(j 6=ℓ)

ηjkηkℓηℓj

∑

α+β+γ=p

α,β,γ≥1

xαj x

βkx

γℓ = 3

∑

k 6=j

ηjjη2jk

∑

α+β+γ=p

α,β,γ≥1

xα+βj xγ

k

︸︷︷︸

I1

+∑

α+β+γ=p

α,β,γ≥1

Ψ(α,β,γ)︷︸︸︷∑

j,k,ℓ

ζjkζkℓζℓjxαj x

βkx

γℓ

︸︷︷︸

I2

.

(18)We also have

|I1| ≤∑

k 6=j

|ηjj |η2jk

∑

α+β+γ=p

α,β,γ≥1

xα+βj xγ

k ≤∑

α+β+γ=p

α,β,γ≥1

∑

j 6=k

η2jkx

α+βj xγ

k ≤ (p− 2)∑

1≤γ<p

∑

j 6=k

η2jkx

p−γj xγ

k , (19)

where the second inequality is given by (15). Further, Ψ(α, β, γ) clearly is symmetric in the arguments,which gives the first inequality in the following chain (where X = Diagx1, ..., xn and ‖z‖F is theFrobenius norm of a matrix):

|I2| ≤ ∑

α+β+γ=p

1≤α≤β≥γ≥1

6| ∑j,k,ℓ

ζjkζkℓζℓjxαj x

p−2α2

k xp−2γ

2

k xγℓ |

= 6∑

1≤α,γ<p/2

|Tr(

[XαζXp−2α

2 ][Xp−2β

2 ζXγ ]ζ)

|

≤ 12∑

1≤α,γ<p/2

‖XαζXp−2α

2 ‖F︸︷︷︸

Sα

‖X p−2β2 ζXγ‖F [by (17)]

≤ 12∑

1≤α,γ<p/2

√Sα

√Sγ

[

Sα =∑

µ,νx2α

µ ζ2µνx

p−2αν ⇒ ∑

1≤α<p/2

Sα ≤ R]

= 12

(

∑

1≤α<p/2

S1/2α

)2

≤ 6p∑

1≤α<p/2

Sα ≤ 6pR.

(20)

Combining (19), (20), we arrive at (16.c).

30. Combining (16) with (14), we arrive at the desired inequality (13).The following statement is the matrix analogue of Corollary 2.1.

Proposition 4.2 Let f(x) = Det1/m(x), where x ∈ Sm++. Then f is concave on S

m++ and

x ≻ 0, x± h 0 ⇒ |D3f(x)[h, h, h]| ≤ −7D2f(x)[h, h]. (21)

14

Proof. Setting H(x) = ln Det(x), x ≻ 0, we have

f(x) = expH(x)/m,Df(x)[h] = f(x)(m−1DH(x)[h]) = f(x)(m−1Tr(x−1h)),

D2f(x)[h, h] = f(x)(m−1DH(x)[h])2 + f(x)(m−1D2H(x)[h, h])= f(x)

[(m−1Tr(x−1h))2 −m−1Tr(x−1hx−1h)

],

D3f(x)[h, h, h] = f(x)(m−1DH(x)[h])[(m−1Tr(x−1h))2 −m−1Tr(x−1hx−1h)

]

+f(x)[−2(m−1Tr(x−1h)(m−1Tr(x−1hx−1h) + 2m−1Tr(x−1hx−1hx−1h

]

Setting η = x−1/2hx−1/2 and denoting by λ(u) the vector of eigenvalues of u ∈ Sn, by Eg

the average of the coordinates of a vector g, and by [g]k, g being a vector, the vector withcoordinates gk

i , we get

Df(x)[h] = f(x)Eλ(η) = f(x)µ[µ := Eλ(η)]

D2f(x)[h, h] = f(x)[µ2 − E

[λ(x)]2

]= f(x)E

[σ]2

[σi := λi(η) − µ]

D3f(x)[h, h, h] = f(x)

[

µ3 − 3µE[λ(η)]2

+ 2E

[λ(η)]3

]

= f(x)[µ3 − 3µ[µ2 + E

[σ]2

+ 2E

µ3e+ 3µ2σ + 3µ[σ]2 + [σ]3

]

= f(x)[3µE

[σ]2

+ 2E

[σ]3

].

Under the premise in (21), we have ‖λ(η)‖∞ ≤ 1, whence |µ| ≤ 1 and ‖σ‖∞ ≤ 2, which, in viewof the above formulas for the derivatives of f , immediately implies the conclusion in (21).

4.1.2 The ‖ · ‖p-barrier

Now, we are ready to state and prove the main result for the matrix generalization of the p-normconstruction.

Theorem 4.1 Let K be a closed convex cone with a nonempty interior in Rn, F−(u) be a ϑ−-

self-concordant barrier for K, and let Ai : Rn → S

ni, i = 1, ...,m, be linear mappings suchthat

u ∈ intK ⇒ Ai(u) ≻ 0, i = 1, ...,m.

Let us setA(u) = DiagA1(u), ...,Am(u).

(i) Given integer p ≥ 2, consider the function

g(u) := (Tr([A(u)]p)1/p : intK → R.

This function is convex and O(1)p-compatible with its domain:

u ∈ intK, u± du ∈ K ⇒ |D3g(u)[du, du, du]| ≤ O(1)pD2g(u)[du, du], (22)

so that the functionΦ(t, u) = − ln(t− g(u)) +O(1)p2F−(u)

is a self-concordant barrier with the parameter

ϑ = 1 +O(1)p2ϑ−

15

for the coneK(p) = (t, u) ∈ R+ × K : g(u) ≤ t.

Moreover, with

θ =

(m∑

i=1

ni

)1/p

we haveMθ ⊆ K(p) ⊆ M1, (23)

where Mr is the cone

(t, u) : u ∈ K,Ai(u) r−1Ini , i = 1, ...,m.

(ii) Given integer p ≤ −2, consider the function

g(u) := (Tr([A(u)]p)1/p : intK → R.

This function is concave and O(1)|p|-compatible with its domain:

u ∈ intK, u± du ∈ K ⇒ |D3g(u)[du, du, du]| ≤ −O(1)|p|D2g(u)[du, du], (24)

so that the functionΦ(t, u) = − ln(g(u) − t) +O(1)p2F−(u)


ϑ = 1 +O(1)p2ϑ−

for the coneK(p) = (t, u) ∈ R+ × K : g(u) ≥ t.

Moreover, with

θ =

(m∑

i=1

ni

)1/|p|

we haveN1 ⊆ K(p) ⊆ Nθ, (25)

where Nr is the cone

(t, u) : u ∈ K,Ai(u) r−1Ini , i = 1, ...,m.

Proof. All we need is to prove (22) and (24); the statements on self-concordance of Φ(·) aredirect consequences of the former relations (see [20]), and inclusions (23), (25) are evident.

To prove (22) (the proof of (24) is similar), let N :=∑

ini and let f(y) := (Tr(yp))1/p :

SN++ → R. Assuming that u, du satisfy the premise in (22), let us set x := A(u), h := A(du).

Since A(·) is a linear mapping which maps intK into SN++, we have

x ≻ 0, x± h 0, (26)

whence, by Proposition 4.1,

|D3f(x)[h, h, h]| ≤ O(1)pD2f(x)[h, h].

Since Dκg(u)[du, ..., du] = Dκf(x)[h, ..., h], we see that the conclusion in (22) indeed is true.

16

4.2 The log-exp construction

Our log-exp construction for packing and covering LPs also generalize to the matrix case. Inwhat follows, we use the matrix exponential. That is, for x ∈ S

n,

expx :=∞∑

k=0

1

k!xk.

We first begin with the compatibility result.

4.2.1 Compatibility

Proposition 4.3 LetF (x) := Tr(expx), f(x) := lnF (x) [x ∈ S

n].

Then(LI x ≻ 0,−x h x) ⇒ |D3f(x)[h, h, h]| ≤ O(1)LD2f(x)[h, h]. (27)

Proof. 10. Let us compute the derivatives of f(x) assuming x ≻ 0.We have

Df(x)[h] = DF (x)[h]F (x) = Tr(expxh)

Tr(expx)

D2f(x)[h, h] = −(

DF (x)[h]F (x)

)2

+ D2F (x)[h,h]F (x)

D3f(x)[h, h, h] = 2(

DF (x)[h]F (x)

)3

− 3DF (x)[h]D2F (x)[h,h]F 2(x) + D3F (x)[h,h]

F (x) .

Let x ≻ 0, let ej be the orthonormal eigenbasis of x and let xj be the corresponding eigenvalues as

before. Also let hkj = eTj hek, and finally let ηkj = x

−1/2k x

−1/2j hij . We have, assuming all xj ’s are distinct:

DF (x)[h] = Tr(expxh) = 12πi

∮Tr((zI − x)−1h) expzdz =

∑

j

expxjhjj

=∑

j

η2jjxj expxj

D2F (x)[h, h] = 12πi

∮Tr((zI − x)−1h(zI − x)−1h) expzdz = 1

2πi

∮ ∑

j,k

h2jk expz

(z−xj)(z−xk)dz

=∑

j 6=k

h2jk

expxj−expxkxj−xk

+∑

j

h2jj expxj

=∞∑

p=1

∑

j 6=k

h2jk

1p!

∑

α+β=p−1

xαj x

βk +

∑

j

h2jj expxj

=∞∑

p=1

∑

j 6=k

∑

α+β=p+1α,β≥1

η2jk

1p!x

αj x

βk +

∑

j

η2jjx

2j expxj

D3F (x)[h, h, h] = 22πi

∮Tr((zI − x)−1h(zI − x)−1h(zI − x)−1h) expzdz

= 22πi

∮ ∑

j,k,ℓ

hjkhkℓhℓj expz(z−xj)(z−xk)(z−xl)

dz

= 2∑

j,k,ℓ

hjkhkℓhℓjΓ(xj , xk, xℓ)

where

Γ(a, b, c) =expa

(a− b)(a− c)+

expb(b − a)(b− c)

+expc

(c− a)(c− b)

for distinct a, b, c, andΓ(a, b, c) = lim

(a′,b′,c′)→(a,b,c)

a′ 6=b′ 6=c′ 6=a′

Γ(a′, b′, c′).

17

Assuming a, b, c distinct, we have

Γ(a, b, c) = expa(a−b)(a−c) + expb

(b−a)(b−c) + expc(c−a)(c−b) = expa−expc

(a−b)(a−c) + expb−expc(b−a)(b−c)

= 1a−b

[∞∑

p=1

1p!

∑

α+β=p−1

aαbβ −∞∑

p=1

1p!

∑

α+β=p−1

cαbβ

]

=∞∑

p=2

1p!

∑

α+β=p−1α≥1

bβ aα−cα

a−c =∞∑

p=2

1p!

∑

µ+ν+β=p−2

aµcνbβ .

The resulting representation is, of course, valid for all a, b, c. We therefore get

D3F (x)[h, h, h] = 2∑

j,k,ℓ

hjkhkℓhℓj

∞∑

p=2

1p!

∑

α+β+γ=p−2

xαj x

βkx

γℓ

= 2∞∑

p=2

∑

j,k,ℓ

∑

α+β+γ=p+1α,β,γ≥1

ηjkηkℓηℓj1p!x

αj x

βkx

γℓ .

We now have

Df(x)[h] =

P

j

expxjηjjxj

P

j

expxj=∑

j

pj(xjηjj) =: µ

[

pj :=expxj

P

k

expxk

]

,

D2f(x)[h, h] = −(

∑

j

pjxjηjj

)2

+

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k+

P

j

x2jη2

jj expxj

P

j

expxj

=∑

j

pjσ2j +

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

j

expxj[σj := xjηjj − µ]

and

D3f(x)[h, h, h]

= 2µ3 − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

j

expxj+∑

j

pjx2jη

2jj

+ D3F (x)[h,h,h]F

= 2µ3 − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

ν

expxν+ µ2 +

∑

j

pjσ2j

+

2∞P

p=2

P

j,k,ℓ

P



αj xβ

kxγ

ℓ

P

ν

expxν

= −µ3 − 3µ∑

j

pjσ2j − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

ν

expxν+

2∞P

p=2

1p!

P

j

η3jj

P


xαj xβ

j xγj

P

ν

expxν

+

2∞P

p=2

P

j,k,ℓ(j 6=k)|(j 6=ℓ)

P


ηjkηkℓηℓj1p! x

αj xβ

kxγ

ℓ

P

ν

expxν.

18

Whence

D3f(x)[h, h, h]

= −µ3 − 3µ∑

j

pjσ2j − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

ν

expxν+

2∞P

p=2

1p!

P

j

η3jj xp+1

j

p(p−1)2

P

ν

expxν

+

2∞P

p=2

P

j,k,ℓ(j 6=k)|(j 6=ℓ)

P



αj xβ

kxγ

ℓ

P

ν

expxν

= −µ3 − 3µ∑

j

pjσ2j − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p!x

αj xβ

k

P

ν

expxν+∑

j

pj η3jjx

3j

︸︷︷︸

(σj+µ)3

=∑

j

pjσ3j − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p! x

αj xβ

k

P

ν

expxν+

2∞P

p=2

P

j,k,ℓ(j 6=k)|(j 6=ℓ)

P



αj xβ

kxγ

ℓ

P

ν

expxν

=∑

j

pjσ3j − 3µ

∞P

p=1

P

j 6=k

P

α+β=p+1α,β≥1

η2jk

1p! x

αj xβ

k

P

ν

expxν+

6∞P

p=2

P

j 6=ℓ

P

α+β=p+1α≥2,β≥1

ηjjη2jℓ

1p! (α−1)xα

j xβ

ℓ

P

ν

expxν

+

2∞P

p=2

P

j 6=k 6=ℓ 6=j

P


ηjkηkℓηℓj1p! x

αj xβ

kxγ

ℓ

P

ν

expxν.

Thus,

df := Df(x)[h] =∑

j

pj(xjηjj) = µ

[

pj =expxj

P

k

expxk

]

d2f := D2f(x)[h, h] =∑

j

pjσ2j

︸︷︷︸

R1

+

∞∑

p=1

∑

j 6=k

∑

α+β=p+1α,β≥1

η2jk

1p!x

αj x

βk

∑

j

expxj

︸︷︷︸

R2

[σj = xjηjj − µ]

d3f := D3f(x)[h, h, h] =∑

j

pjσ3j

︸︷︷︸

I0

−3µ

∞∑

p=1

∑

j 6=k

∑

α+β=p+1α,β≥1

η2jk

1p!x

αj x

βk

∑

νexpxν

︸︷︷︸

I1

+

6∞∑

p=2

∑

j 6=k

∑

α+β=p+1α≥2,β≥1

ηjjη2jk

1p! (α− 1)xα

j xβk

∑

νexpxν

︸︷︷︸

J0

+

2∞∑

p=2

∑

j 6=k 6=ℓ 6=j

∑



αj x

βkx

γℓ

∑

νexpxν

︸︷︷︸

J1

.

(28)

The resulting formulas for the derivatives, although established under the assumption that all xj are

distinct, clearly remain valid for all x ≻ 0.

20. Now let x, h satisfy the premise of (27). Then

0 < xj ≤ L, j = 1, ..., n, & − I η I. (29)

Whence|µ| ≤ L, |σj | ≤ 2L. (30)

19

It follows that|I0| + |I1| ≤ 3Ld2f. (31)

Further, we have

|J0| ≤6

∞P

p=2

P

j 6=k

P

α+β=p+1α≥2,β≥1

|ηjj |η2jk

1p!

(α−1)xαj xβ

k

P

νexpxν

≤6

∞P

p=2

P

j 6=k

P

α+β=p+1α≥2,β≥1

η2jk

α−1p!

xαj xβ

k

P

νexpxν

[by (29)]

=

6∞P

q=1

P

j 6=k

P

α′+β=q+1α′≥1,β≥1

η2jkxj

α′

(q+1)!xα′

j xβk

P

νexpxν

[p = q + 1, α = α′ + 1]

≤ 6LR1 [due to 0 ≤ xj ≤ L, α′

q+1 ≤ 1].

(32)

Now, let ζ be the matrix obtained from η by replacing the diagonal entries with zeros. Then

−2I ζ 2I (33)

and

J1 =

2∞∑

p=2

1p!

Φ(α,β,γ)︷︸︸︷∑


∑

j,k,ℓ

ζjkζkℓζℓj1

p!xα

j xβkx

γℓ

∑

νexpxν

.

Φ(α, β, γ) clearly is symmetric in α, β, γ, which gives the first inequality in the following chain:

|J1| ≤12

∞P

p=2

1p!

P

α+β+γ=p+11≤α≤β≥γ≥1

˛

˛

˛

˛

˛

P

j,k,ℓζjkζkℓζℓj

1p!

xαj xβ

kxγℓ

˛

˛

˛

˛

˛

P

νexpxν

=

12∞P

p=2

1p!

P

α+β+γ=p+11≤α≤β≥γ≥1

|Tr(XαζXβζXγζ)|

P

νexpxν

[X = Diagx1, ..., xn]

≤

12∞P

p=2

1p!

P

α+γ≤2(p+1)

3

1≤α,γ<p+12

|Tr([XαζXp+1−2α

2 ][Xp+1−2γ

2 ζXγ ]ζ)|

P

νexpxν

≤

24∞P

p=2

1p!

P

α+γ≤2(p+1)

3

1≤α,γ<p+12

Sα︷︸︸︷

‖XαζXp+1−2α

2 ‖F ‖Xp+1−2γ

2 ζXγ‖F

P

νexpxν

[due to −2I ≤ ζ ≤ 2I].

We haveS2

α =∑

j 6=k

x2αj η2

jkxp+1−2αk .

20

Therefore ∑

1≤α< p+12

S2α ≤

∑

j 6=k

∑

µ+τ=p+1µ,τ≥1

η2jkx

µj x

τk;

whence

|J1| ≤24

∞P

p=2

1p!

P

1≤α,γ<p+12

SαSγ

P

νexpxν

=

24∞P

p=2

1p!

0

B

@

P

1≤α<p+12

Sα

1

C

A

2

P

νexpxν

≤24

∞P

p=2

pp!

P

1≤α<p+12

S2α

P

νexpxν

≤24

∞P

q=1

1q!

P

j 6=k

P

µ+τ=q+2µ,τ≥1

η2jkxµ

j xτk

P

νexpxν

.

Since 0 < xj ≤ L, we clearly have

∞∑

q=1

1

q!

∑

j 6=k

∑

µ+τ=q+2µ,τ≥1

η2jkx

µj x

τk ≤ 2L

∞∑

q=1

1

q!

∑

j 6=k

∑

α+β=q+1α,β≥1

η2jkx

αj x

βk ,

and we arrive at|J1| ≤ 48LR2. (34)

Combining (31), (32) and (34), we arrive at the relation

|d3f | ≤ O(1)Ld2f,

as claimed.

4.2.2 The log-exp barrier

As before, now that the compatibility result is established, we can state and prove the maintheorem for the log-exp construction.

Theorem 4.2 Let K be a closed convex cone with a nonempty interior in Rn, F−(u) be a ϑ−-

self-concordant barrier for K, and let Ai : Rn → S

ni, i = 1, ...,m, be linear mappings suchthat

u ∈ intK ⇒ Ai(u) ≻ 0, i = 1, ...,m.

Let us setA(u) = DiagA1(u), ...,Am(u).

Given L > lnN , where N =∑

ini, consider the function

g(u) := ln (Tr(expLA(u))) : intK → R.

This function is convex and satisfies the relation:

u ∈ intK, u± du ∈ K, g(u) ≤ L⇒ |D3g(u)[du, du, du]| ≤ O(1)LD2g(u)[du, du]. (35)

Consequently, the function

Φ(u) = − ln(L− g(u)) +O(1)L2F−(u)

21


ϑ = 1 +O(1)L2ϑ−

for the setK(L) := clu ∈ intK : g(u) ≤ L.

Moreover, when δ ∈ (0, 1) and L ≥ ln Nδ , we have

u ∈ K : Ai(u) (1 − δ)Ini , i = 1, ...,m ⊆ K(L) ⊆ u ∈ K : Ai(u) Ini , i = 1, ...,m. (36)

Proof. Same as in the proof of Proposition 2.1, all we need is to verify (35), which is immediate.Indeed, let u, du satisfy the premise in (35), and let x = LA(u), h = LA(du). Since A is a linearmapping which maps intK into int S

N+ , we have x ≻ 0 and x± h 0. Moreover, from g(u) ≤ L

it follows that x LIN . Setting f(y) = ln Tr(expy), we have g(v) = f(LA(v)), whenceDkg(u)[du, . . . , du] = Dkf(x)[h, ..., h], k = 0, 1, . . .. As we have seen, x, h satisfy the premise in(27); applying Proposition 4.3, we arrive at the conclusion in (35).

5 A Generalization to Convex Semi-infinite Programming

The reader must have recognized that there are certain uniform structures to the derivativesof the functions which we utilized in this paper. These structures seem critical in securingthe necessary inequalities in the barrier calculus of Nesterov and Nemirovski [20] and in turnobtaining the related barriers with the desired self-concordance properties. In this section,we show that many of these properties generalize to the case when our variable is infinitedimensional.

Let fα(x) > 0 for all x ∈ Rn++ and µ be a measure on the set of indices T . Let us define

Φ(x) :=

(∫

fpα(x)µ(dα)

)1/p

.

Then for all 0 6= p ∈ R, we have

DΦ(x)[h] = Φ(x)Ex Sx,h(·) ,where Sx,h(α) = Dfα(x)[h]

fα(x) , πx(α) = fpα(x)

R

fpβ (x)µ(dβ)

and Exg(·) =∫g(α)πx(α)µ(dα);

D2Φ(x)[h, h] = Φ(x)[

(p− 1)Ex

σ2x,h

+ Ex

D2fα(x)[h,h]

fα(x)

]

,

where σx,h(α) = Sx,h(α) − Ex Sx,h(·) ;

D3Φ(x)[h, h, h] = Φ(x)

[

Ex

D3fα(x)[h,h,h]

fα(x)

+(p− 1)

(p − 2)Ex

σ3x,h(·)

− 3Ex Sx,h(·)Ex

σ2x,h(·)

+ 3Ex

σx,h(·) · D2fα(x)[h,h]fα(x)

]

.

Proposition 5.1 Suppose that D2fα(x)[h, h] ≥ 0 for every x ∈ Rn++ and for every h ∈ R

n. Letξ1 and ξ2 be given such that

supx∈G;h:(x±h)≥0

|Sx,h| ≤ ξ1,

and

D3fα(x)[h, h, h] ≤ ξ2D2fα(x)[h, h], for every x ∈ R

n++, and h such that (x± h) ≥ 0.

22

Then, for every x ∈ Rn++ and every h ∈ R

n such that (x± h) ≥ 0, we have

∣∣D3Φ(x)[h, h, h]

∣∣ ≤ max (2p − 1)ξ1, 6(p − 1)ξ1 + ξ2D2Φ(x)[h, h].

Proof. We simply substitute the bounds given in the assumption part of the statement inthe expression for the third derivative (given immediately before the proposition) and the claimof the proposition easily follows.

6 Recovering a Good Dual Solution

In the previous sections we showed how to construct self-concordant barriers for the convexapproximations G of the convex set of main interest G. Once we have such a barrier, we canuse the general self-concordance theory and we immediately have various path-following andpotential-reduction interior-point algorithms to optimize a linear function over G.

If we can compute the Legendre-Fenchel conjugate of our barrier efficiently, then we can evenapply some primal-dual algorithms (as in [14]). However, if the Legendre-Fenchel conjugate isnot available for efficient computation, then we are stuck with primal-only algorithms. Even insuch a case, we would be interested in generating good dual solutions. This section is dedicatedto showing a way to recover a good dual solution from a good, central primal solution.

Cones Mp. For y ∈ Sm and p ∈ [1,∞], let |y|p := ‖λ(y)‖p. Further let K be a closed convex

cone with nonempty interior in an Euclidean space (E, 〈·, ·〉) and P be a linear mapping from Eto S

m such that Px ≻ 0 whenever x ∈ intK. Finally, we define

Mp := (t, x) : x ∈ K, |Px|p ≤ t .

We have (〈·, ·〉F stands for the Frobenius inner product on Sm):

M∗p = (τ, ξ) : x ∈ K, |Px|p ≤ t⇒ τt+ 〈x, ξ〉 ≥ 0

=

(τ, ξ) : ∃(φ ∈ K∗, η ∈ Sn, σ ∈ R, |η|q ≤ σ) :

〈φ, x〉 + σt− 〈Px, η〉F= tτ + 〈x, ξ〉,∀x, t

[1q + 1

p = 1]

= (τ, ξ) : ∃(η ∈ Sn, φ ∈ K∗) : ξ = φ− P∗η, |η|q ≤ τ .

Since P maps K into Sm+ , P∗ maps S

m+ into K∗; thus, whenever η′ η, we have (P∗η′ − P∗η) ∈

K∗. It follows that if ξ = φ−P∗η with φ ∈ K∗ and |η|p ≤ τ and η+ is the “positive part” of η,then ξ = (φ+ P∗η+ − P∗η)

︸︷︷︸

∈K∗

−P∗η+ and |η+| ≤ τ . We arrive at

M∗p = (τ, ξ) : ∃(φ ∈ K∗, η 0) : ξ = ψ − P∗η, |η|p ≤ τ .

Thus, a primal-dual pair of conic problems associated with Mp is

mins

eT s : As− b ∈ K, |P(As − b)|p ≤ cT s− d

(P )

mmaxφ,η,τ

〈φ− P∗η, b〉 + τd : A∗(φ− P∗η) + τc = e, φ ∈ K∗, |η|q ≤ τ, η 0 . (D)

23

Here, the data are given by (A, b, c, d, e), where e (no longer a vector of all ones) is arbitrary.Now, let p ≥ 2 be integer, F be a ϑ-logarithmically homogeneous s.c.b. for K, α ≥ 1 andβ = O(1)p2; then

Φp(t, x) = −α ln (t− |Px|p) + β2F (x)

is ϑ(p)-logarithmically homogeneous s.c.b. for Mp, with

ϑ(p) = α+ β2ϑ.

Let sρ be a central solution to (P ):

sρ = argmins

ρeT s+ Φp(c

T s− d,As − b)

⇓xρ := Asρ − b, tρ := cT sρ − d−, ζρ = Pxρ, ωρ = |ζρ|p, ξρ = [ω−1

ρ ζρ]p−1

⇓ρe− α

tρ−ωρ[c−A∗P∗ξρ] + β2A∗∇F (Asρ − b) = 0

⇓τρ := α

ρ(tρ−ωρ) , ηρ := τρξρ, φρ := −β2

ρ ∇F (Asρ − b), λ := λ(ξρ) [|ω−1ρ ζρ|p = 1 ⇒ ‖λ‖ p

p−1= 1]

⇓

A∗(φρ − P∗ηρ) + τρc = e, φρ ∈ K∗, ηρ 0,|ηρ|q = τρ‖λ‖ p

p−1= τρ

.

That is, a central solution sρ generates a feasible solution (φρ, ηρ, τρ) of (D). We have

〈φρ − P∗ηρ, b〉 + τρd =

=sTρ [A∗(φρ−P∗ηρ)+τρc]=sT

ρ e︷︸︸︷

〈φρ − P∗ηρ, Asρ〉 + τρcT sρ +〈φρ − P∗ηρ, b−Asρ〉 + τρ(d− cT sρ)

= eT sρ + 1ρ

[∇Φp(c

T sρ − d,Asρ − b)]T[cT sρ − dAsρ − b

]

= eT sρ − ϑ(p)ρ .

Example. Let Qi 0. Then

minλ

−eTλ : |∑iλiQi|p ≤ 1∀i, λ ≥ 0

[

A = Id, b = 0,K = Rn+, c = 0, d = −1,Pλ =

∑

iλiQi

] (P )

⇓

maxφ,τ,η

−τ :

φi − Tr(η,Qi) = −1∀iφ ≥ 0|η|q ≤ τ

(D)

m−min

η|η|q : Tr(ηQi) ≥ 1∀i

m−min

η|η|q : Tr(ηQi) ≥ 1∀i, η 0 .

24

Cones Np. Let K be a closed convex cone with a nonempty interior in Euclidean space(E, 〈·, ·〉) and P be a linear mapping from E to S

m such that Px ≻ 0 whenever x ∈ intK. Forpositive integer p, let

Np := cl

(t, x) : t > 0, x ∈ intK,(Tr([Px]−p)

)−1/p ≥ t

.

We have

N∗p =

(τ, ξ) : x ∈ intK, t > 0, |[Px]−1|p ≤ t−1 ⇒ τt+ 〈ξ, x〉 ≥ 0

=(τ, ξ) : x ∈ intK, t > 0, |t2[Px]−1|p ≤ t⇒ τt+ 〈ξ, x〉 ≥ 0

=

(τ, ξ) : minx,y,t

τt+ 〈ξ, x〉 : t2[Px]−1 y, |y|p ≤ t, x ∈ intK, t > 0

≥ 0

=

(τ, ξ) : minx,y,t

τt+ 〈ξ, x〉 :

[y tItI Px

]

0, |y|p ≤ t, t > 0, x ∈ intK

≥ 0

=

(τ, ξ) : minx,y,t

τt+ 〈ξ, x〉 :

[y tItI Px

]

0, |y|p ≤ t, x ∈ K

≥ 0

=

(τ, ξ) : ∃ (α, β, η, γ, σ, φ) :

[α βT

β η

]

0, |γ|q ≤ σ, φ ∈ K∗,

Tr(yα) + 2Tr(tβ) + Tr(ηPx) + tσ − Tr(yγ) + 〈φ, x〉= τt+ 〈ξ, x〉 ∀x, y, t

[1q + 1

p = 1]

=

(τ, ξ) : ∃ (η 0, φ ∈ K∗, σ) :

ξ = φ+ P∗η

|σ − τ | ≤ 2maxα,β

2Tr(β) :

[α βT

β η

]

0, |α|q ≤ σ

.

Let us solve the optimization problem

maxα,β

2Tr(β) :

[α βT

β η

]

0, |α|q ≤ σ

.

When solving the problem, we may assume without loss of generality that η is diagonal. In thiscase, the feasible solution set of the problem remains invariant under the mappings (α, β) 7→(GαG,GβG), where G is a diagonal matrix with diagonal entries ±1, and the objective functionalso remains invariant under these mappings. Since the problem is convex, it follows that theoptimal value remains unchanged when α, β are restricted to be invariant with respect to theabove transformations (that is, we can assume that α and β are diagonal). In this case, theproblem becomes

Opt = maxαi,βi

∑

i

βi : β2i ≤ αiηi,

∑

i

αp

p−1

i ≤ σp

p−1

,

where ηi are the eigenvalues of η. We have

Opt = maxαi

∑

iη

1/2i α

1/2i :

∑

iα

pp−1

i ≤ σp

p−1

= maxδi

δiη1/2i : ‖δ‖ 2p

p−1≤ σ1/2

=√σ

(∑

iη

12·

2p/(p−1)2p/(p−1)−1

i

)2p/(p−1)−12p/(p−1)

=√

σ|η| pp+1

,

25

where η ∈ Sm+ ; note that |η|r is concave in η ∈ S

m+ when 0 < r ≤ 1. Thus,

N∗p =

(τ, ξ) : ∃ (η 0, φ ∈ K∗, σ ≥ 0) : ξ = φ+ P∗η, |τ − σ| ≤ 2√

σ|η| pp+1

=

(τ, ξ) : ∃ (η 0, φ ∈ K∗) : ξ = φ+ P∗η, 0 ≤ τ + |η| pp+1

.

Thus, a primal-dual pair of conic problems associated with Np is

mins

eT s : As− b ∈ K, cT s− d ≥ 0, |[P(As − b)]−1|p ≥ 1cT s−d

(P )

mmaxφ,η,τ

〈φ+ P∗η, b〉 + τd : A∗(φ+ P∗η) + τc = e, φ ∈ K∗, 0 ≤ τ + |η| pp+1

, η 0

. (D)

Now, let F be a ϑ-logarithmically homogeneous s.c.b. for K, α, γ ≥ 1 and β = O(1)p2; then

Ψp(t, x) = −α ln(|[Px]−1|−1

p − t)− γ ln t+ β2F (x)

is ϑ(p)-logarithmically homogeneous s.c.b. for Np, with

ϑ(p) = α+ γ + β2ϑ.

Let sρ be a central solution to (P ):

sρ = argmins

ρeT s+ Ψp(c

T s− d,As − b)

⇓xρ := Asρ − b, tρ := cT sρ − d, ζρ = [Pxρ]

−1, ωρ = |ζρ|p, ξρ = [ω−1ρ ζρ]

p+1

⇓ρe− α

ω−1ρ −tρ

[A∗P∗ξρ − c] − βtρc+ β2A∗∇F (Asρ − b) = 0

⇓δρ := α

ρ(ω−1ρ −tρ)

, τρ := βρtρ

− δρ, ηρ := δρξρ,

φρ := −β2

ρ ∇F (Asρ − b), λ := λ(ξρ) [|ω−1ρ ζρ|p = 1 ⇒ ‖λ‖ p

p+1= 1]

⇓

A∗(φρ + P∗ηρ) + τρc = e, φ ∈ K∗, η 0|ηρ| p

p+1= δρ‖λ‖ p

p+1= δρ ≥ −τρ

.

So, a central solution sρ generates a feasible solution (φρ, ηρ, τρ) of (D). We have

〈φρ + P∗ηρ, b〉 + τρd =

=sTρ [A∗(φρ+P∗ηρ)+τρc]=sT

ρ e︷︸︸︷

〈φρ + P∗ηρ, Asρ〉 + τρcT sρ +〈φρ + P∗ηρ, b−Asρ〉 + τρ(d− cT sρ)

= eT sρ + 1ρ

[∇Ψp(c

T sρ − d,Asρ − b)]T[cT sρ − dAsρ − b

]

= eT sρ − ϑ(p)ρ .

7 Lipschitz Continuous Gradients

For the minimization of convex functions with Lipschitz continuous gradients, under certainfavorable circumstances, the first-order methods can achieve the further improved iteration

26

bound of O

(√Lǫ

)

(see [19], also see [13]). So, let us look at a p-norm type function applied to

the eigenvalues of a symmetric matrix. Let p ≥ 3, and let

H(x) = ‖λ+(x)‖2/pp , S(x) = ‖λ+(x)‖p

p,

where x is a symmetric matrix and λ+(x) is the vector with the entries max[0, λi(x)] and λi(x)are the eigenvalues of x.

Proposition 7.1 The function H(·) is convex and continuously differentiable with Lipschitzcontinuous gradient, specifically,

x, y ∈ Sn ⇒ |H ′(x) −H ′(y)|q ≤ 2(p− 1)|x − y|p.

Proof. It suffices to verify thatH is continuously differentiable, twice continuously differentiable(except at the origin) such that

x 6= 0 ⇒ 0 ≤ D2H(x)[h, h] ≤ 2p|h|2p ∀h. (37)

Let γ be a simple closed curve in the right half-plane which encircles [0, L]. For x ∈ Sn with

maxiλi(x) < L we have

S(x) =1

2πi

∮

γ

zpTr((zI − x)−1)dz

whence, as it is immediately seen, S is twice continuously differentiable everywhere, so thatH is twice continuously differentiable (except at the origin); moreover, H clearly is continu-ously differentiable. It is also well-known that H is convex (as a symmetric convex function ofeigenvalues of a symmetric matrix).

Let λi be the eigenvalues of x, I = i : λi > 0, J = i : λi ≤ 0, let h be a symmetricmatrix and hij be the entries of h in the orthonormal eigenbasis of x. Then

DS(x)[h] = 12πi

∮

γ

pzp−1Tr((zI − x)−1h)dz = p∑

i∈I

λp−1i hii

D2S(x)[h, h] = 12πi

∮

γ

pzp−1Tr((zI − x)−1h(zI − x)−1h)dz = 12πi

∮

γ

pzp−1

(

∑

i,j

h2ij

1(z−λi)(z−λj)

)

dz

= p∑

i,j∈I

h2ij

λp−1i −λp−1

j

λi−λj+ 2p

∑

i∈I,j∈J

h2ij

λp−1i

λi−λj

[

where ap−1−bp−1

a−b = (p− 1)ap−2 when a = b]

≤ p(p− 1)∑

i,j∈I

h2ij

λp−2i +λp−2

j

2 + 2p∑

i∈I,j∈J

h2ijλ

p−2i [due to θq−1

θ−1 ≤ q θq−1+12 for θ, q ≥ 1]

≤ p(p− 1)∑

i∈I

λp−2i

∑

j

η2ij

whence

DH(x)[h] = 2p(S(x))

2p−1DS(x)[h] = 2(S(x))

2p−1 ∑

i∈I

hiiλp−1i

D2H[x][h, h] = 2p

(2p − 1

)

(S(x))2p−2

(DS(x)[h])2 + 2p(S(x))

2p−1D2S(x)[h, h]

≤ 2p(S(x))

2p−1D2S(x)[h, h].

27

Since D2H(x)[h, h] is homogeneous of degree 0 with respect to x, we may assume when com-

puting D2H(x)[h, h] that S(x) = 1, that is,∑

i∈Iλp

i = 1. In this case, setting ηi =√∑

jh2

ij :

D2H(x)[h, h] ≤ 2pD

2S(x)[h, h] ≤ 2(p − 1)∑

i∈Iλp−2

i η2i ≤ 2(p − 1)

(∑

i∈Iλp

i

) p−2p(∑

i∈Iηp

i

) 2p

≤ 2(p − 1)‖h‖2p,

where the concluding inequality is due to the following observation:

For h ∈ Sn and p ≥ 2, let η be the vector with entries equal to the Euclidean lengths

of the columns in h. Then ‖η‖p ≤ |h|p.Indeed, setting h = vsvT , where v is orthogonal and s is diagonal with diagonalentries si, the Euclidean norms of the columns in h are the same as Euclidean normsof the columns in svT : η2

j =∑

is2i v

2ji. In other words, the vector [η]2 := (η2

1 , ..., η2n) is

obtained from the vector [s]2 = (s21, ..., s2n) by multiplication by a doubly-stochastic

matrix. It follows that ‖η‖2p = ‖[η]2‖p/2 ≤ ‖[s]2‖p/2 = ‖s‖2

p, as claimed.

We have demonstrated (37).

8 Conclusion and Future Work

There are three clear research directions motivated by this work:

1. Design and analysis of cutting-plane interior-point algorithms based on the self-concordantbarriers constructed here. One major advantage of our barriers over those used in the pre-existing work ([1, 2, 8, 12]) is that we do not need to drop any constraints and the additionof new constraints does not change the barrier parameter ϑ.

2. Further extension of the theory to constraints defined by other partial orders, cones (e.g.,partial orders induced by hyperbolic cones [22]).

3. Improvement of the computational complexity of evaluating f , f ′ and f ′′ for such self-concordant barriers.

References

[1] K. M. Anstreicher, Towards a practical volumetric cutting plane method for convex pro-gramming, SIAM J. Optim. 9 (1999) 190–206.

[2] D. S. Atkinson and P. M. Vaidya, A cutting plane algorithm for convex programming thatuses analytic centers, Math. Prog. 69 (1995) 1–43.

[3] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization, MPS-SIAMSeries on Optimization, SIAM, Philadelphia, 2001.

[4] D. Bienstock, Potential Function Methods for Approximately Solving Linear ProgrammingProblems. Theory and Practice, Kluwer Academic Publishers, Boston, USA, 2002.

28

[5] D. Bienstock and G. Iyengar, Approximating fractional packings and coverings in O(1/ǫ)iterations, SIAM J. Comput. 35 (2006) 825–854.

[6] G. Cornuejols, Combinatorial Optimization, Packing and Covering, CBMS-NSF Regionalconference series in applied mathematics, SIAM, 2001.

[7] N. Garg and J. Konemann, Faster and simpler algorithms for multicommodity flow andother fractional packing problems, Proc. 39th Ann. Symp. on FOCS (1998) 300–309.

[8] J.-L. Goffin, Z.-Q. Luo and Y. Ye, Complexity analysis of an interior cutting plane methodfor convex feasibility problems, SIAM J. Optim. 6 (1996) 638–652.

[9] M. D. Grigoriadis and L. G. Khachiyan, Coordination complexity of parallel price-directivedecomposition, Math. Oper. Res. 21 (1996) 321–340.

[10] P. Klein, S. A. Plotkin, C. Stein and E. Tardos, Faster approximation algorithms for theunit capacity concurrent flow problem with applications to routing and finding sparsecuts, SIAM J. Computing 23 (1994) 466–487.

[11] T. Leighton, F. Makedon, S. A. Plotkin, C. Stein, E. Tardos and S. Tragoudas, Fastapproximation algorithms for multicommodity flow problems, J. Comput. System Sciences50 (1995) 228–243.

[12] J. E. Mitchell and M. J. Todd, Solving combinatorial optimization problems using Kar-markar’s algorithm, Math. Prog. 56 (1992) 245–284.

[13] A. Nemirovski, Proximal method with rate of convergence O(1/t) for variational inequal-ities with Lipschitz continuous monotone operators and smooth convex-concave saddlepoint problems, SIAM J. Optim. 15 (2004) 229–251.

[14] A. Nemirovski and L. Tuncel, “Cone-free” primal-dual path-following and potential re-duction polynomial time interior-point methods, Math. Prog. 102 (2005) 261–294.

[15] A. Nemirovski and D. B. Yudin, Problem complexity and method efficiency in optimization,Wiley-Interscience Series in Discrete Mathematics, John Wiley & Sons, Inc., New York,1983.

[16] Yu. Nesterov, Dual extrapolation and its applications for solving variational inequalitiesand related problems, Math. Prog., to appear.

[17] Yu. Nesterov, Rounding of convex sets and efficient gradient methods for linear program-ming problems, Core Discussion Paper 2004/4, Louvain-la-Neuve, Belgium, February2004.

[18] Yu. Nesterov, Unconstrained convex minimization in relative scale, Core Discussion Paper2003/96, Louvain-la-Neuve, Belgium, December 2003.

[19] Yu. Nesterov, Smooth minimization of nonsmooth functions, Math. Prog. 103 (2005) 127–152.

[20] Yu. Nesterov and A. Nemirovskii, Interior point polynomial methods in Convex Program-ming. - SIAM Series in Applied Mathematics, SIAM: Philadelphia, 1994.

29

[21] S. A. Plotkin, D. B. Shmoys and E. Tardos, Fast approximation algorithms for fractionalpacking and covering problems, Math. Oper. Res. 20 (1995) 257–301.

[22] J. Renegar, Hyperbolic programs, and their derivative relaxations, Found. Comput. Math.6 (2006) 59–79.

[23] F. Shahrokhi and D. W. Matula, The maximum concurrent flow problem, J. Assoc. Com-put. Mach. 37 (1990) 318–334.

30

Self-Concordant Barriers for Convex Approximations of ... · (where eis an all ones vector of...

Documents

Transcript of Self-Concordant Barriers for Convex Approximations of ... · (where eis an all ones vector of...