On Modern Convex Optimization The Interior-Point Revolutionandre/ECE-Colloquium.pdf · Disclaimer:...

On Modern Convex OptimizationThe Interior-Point Revolution

Andre Tits

University of Maryland - College Park

May 16, 2008

Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 1 / 24

My plan for today: Convey to you the pervasiveness, the power, the elegance, andI hope the excitement, of modern convex optimization.

Disclaimer: This is but a glimpse at convex optimization, with focus on simple keyideas rather than on the most efficient methods. Rigor is not guaranteed.

No need to take notes: Slides are available for download.

Good news, bad news.


Outline

1 Convex Optimization: What? Why? Where?

2 Simplest Case: Linear Optimization

3 General Case: Conic Optimization

4 Recent Accomplishments at Maryland


Convex set, Convex function

A set S is convex if it contains every line segment who endpoints are in S.

A function f is convex if its epigraph is convex.

When S and f are convex, all stationary points for the problem

minimize f (x) subject to x ∈ S.

are global minimizers.

Convex optimization problems are “easy” to solve.

In the past two decades, a quasi-revolution took place in the development and analysisof effective and efficient methods for the solution of convex problems, in particularinterior-point methods, starting with the ground-breaking work of Nesterov andNemirovski (following that of Karmarkar in linear optimization).


Convex problems are ubiquitous

A few areas of application:

signal processing

VLSI design

estimation

communication networks

circuit design

automatic control

statistics

finance

etc.


Example: VLSI gate sizing

Let t0 = 0, and let T > 0, si , si > 0, and aij > 0 be given.

The minimal area gate sizing problem can be written as

minimizesi ,ti

X

i

si s.t. tj + di(s) ≤ ti whenever j ∈ fanin(i), tn ≤ T , si ≤ si ≤ si ∀i

where si is the size (area) of gate i , ti is an upper bound on the delay from the primaryinput to the output of gate i , and

di(s) := ai0 +X

k∈fanout(i)

aiksk

si.

Equivalently,

minimizesi ,ti

X

i

si s.t. ai0 +X

k∈f.o.(i)

aik sk s−1i ≤ ti − tj , tn ≤ T , si ≤ si ≤ si

The cost and constraint functions being “posynomials”, this is a geometricprogramming problem. It is NOT convex, but it can be “convexified”, as we show next.


VLSI gate sizing: Recasting into a convex problem

The change of variables xi := log(si) yields

minimizexi ,ti

X

i

exi s.t. ai0 +X

k∈f.o.(i)

aikexk−xi ≤ ti − tj , tn ≤ T , log(si) ≤ xi ≤ log(si)

which is convex. Because exponential functions tend to cause numerical difficulties,typically, the following reformulation (obtained by taking logs) is used instead:

minimize log

X

i

exi

!

s.t. log

0

@ai0 +X

k∈f.o.(i)

aikexk−xi

1

A− log(ti − tj) ≤ 0, tn ≤ T , log(si) ≤ xi ≤ log(si)

which is still convex.


Outline






Duality in linear optimization

Primal: vP := min{cTx : Ax = b, x ∈ Rn+}

Dual: vD := max{bTy : ATy + s = c, s ∈ Rn+}

For feasible (x , y , s), the “duality gap” is sTx :

cTx − bTy = (ATy + s)Tx − yTb = sTx .

Weak duality: vP ≥ vD. For feasible (x , y , s),

vP − vD = infx,y feasible

{cTx − bTy} = infx,s feasible

{sTx} ≥ 0.

Note: For feasible (x , s), sTx = 0 iff x i si = 0 for all i .

(Gyula, aka Julius) Farkas’s Lemma (1902; present form due to Albert Tucker):

Mu = du ∈ Rn

+is consistent XOR

−MTv ∈ Rn+

dTv > 0is consistent.

Strong duality: If the primal (or the dual) is feasible,

vP = vD =: vPD

Proof follows from Farkas’s lemma: see next slide (skipped).


Proof of strong duality in linear optimizationBelow, “icst” means “is consistent”. Given θ ∈ R, using Farkas’s lemma, we get

8

<

:

Ax = bx ≥ 0

cTx ≤ θicst ⇐⇒

8

<

:

»

A 0cT 1

– »

xt

–

=

»

bθ

–

(x , t) ∈ Rn+1+

XOR

8

>

>

<

>

>

:

−»

AT c0 1

– »

yu

–

∈ Rm+1+

ˆ

bT θ˜

»

yu

–

> 0icst.

Now suppose that the dual is feasible and θ < vP. Then the left-hand side isinconsistent, and hence the right-hand size must be consistent. That is, there exists(y , u) ∈ R

m+1 such that

ATy + cu ≤ 0, u ≤ 0, bTy + θu > 0.

If u = 0 then vD = +∞. (Indeed, take y = y0 + αy , with y0 feasible (we assumed dualfeasibility) and α → +∞.) Hence vD ≥ vP. On the other hand, if u < 0 then, w.l.o.g.,u = −1. We then get

ATy ≤ c and bTy > θ

The existence of such y implies that vD > θ. We conclude that vD > θ whenever vP > θ,again implying that vD ≥ vP. In view of weak duality, strong duality follows.

With primal feasibility assumed instead, the claim follows with a similar proof, startingby taking θ > vD and expressing “ATy ≤ c, bTy ≥ θ” as

AT(y+ − y−) + s = c, bT(y+ − y−) − t = θ, (s, t , y+, y−) ≥ 0


Primal-dual interior-point methods for LO: Set upNotation: Capital letters X , S, Xk , Sk denote diagonal matrices who entries are thecomponents of the corresponding lower case vector. E.g, X = diag(x i).

In view of strong duality, a triple (x , y , s) solves the primal-dual pair iff:

Ax = b, x ≥ 0 (primal feasibility) (also ∇y LD = 0)

ATy + s = c, s ≥ 0 (dual feasibility) (also ∇xLP = 0)

xTs = 0 (zero duality gap), i.e., Xs = 0 (complementary slackness)

Let β ∈ (0, 1). Let (xk , yk , sk ) ∈ N2(β), where

N2(β) = {(xk , yk , sk ) : PD feasible, ‖Xk sk − µk e‖ < βµk},

with µk :=xT

k skn (the “duality measure”).

Key idea: Focusing on the equalities in the optimality condition, perform a Newtoniteration towards solving the square system, with σ ∈ (0, 1),

Ax = b

ATy + s = c

Xs = σµk e


Primal-dual interior-point methods for LO: “Short Step” method

Why Xs = σµke rather than Xs = 0? With σ close enough to 1, this will make the fullNewton step acceptable globally.

Thus, solve the Newton system

A∆xk = 0

AT∆yk + ∆sk = 0

Xk sk + Xk∆sk + Sk∆xk = σµk e.

and set

xk+1 = xk + ∆xk

yk+1 = yk + ∆yk

sk+1 = sk + ∆sk

It turns out that, with an appropriate selection of σ (e.g., σ = 1 − 1√n), this exceedingly

simple algorithm has rather strong convergence properties.


Primal-dual interior-point methods for LO: Polynomial complexity

The following two results follow:

1 If (xk , sk ) is strictly feasible, then µk+1 :=xT

k+1sk+1n = σµk (immediate).

2 If (xk , yk , sk ) ∈ N2(β), then (xk+1, yk+1, sk+1) ∈ N2(β) (takes some work). Inparticular, (xk+1, sk+1) is strictly feasible.

It follows that, if (x0, y0, s0) ∈ N2(β), then for all k , (xk , yk , sk ) ∈ N2(β) and

cTxk − vPD ≤ xTk sk = nµk = σk nµ0 = σk xT

0 s0,

vPD − bTyk ≤ xTk sk = nµk = σk nµ0 = σkxT

0 s0,

a global linear convergence rate.

Hence, given ǫ > 0, cTxk < vPD + ǫ and bTyk > vPD − ǫ are achieved after

at most√

n log(xT0 s0/ǫ) iterations.

Since every iteration (solution of an (2n + m) × (2n + m) linear system) takes at mostO(n3) flops, the total number of flops to ǫ-solution is at most O(n7/2), which ispolynomial in n.


Outline






Why conic optimization?

Every convex optimization problem can be recast as a conic optimization problem.(And many convex problems arise naturally as conic optimization problems.)

Considerminimize f (x) subject to x ∈ S,

with f and S convex. This problem is equivalently written as

minimize t subject to f (x) ≤ t , x ∈ S.

where t ∈ R is an additional optimization variable. Defining z = (x , t), we can recastthis problem as

minimize cTz subject to z ∈ G,

where c = (0 . . . 01)T and G is a convex set. This can be further recast as

minimize dTw subject to [0 . . . 0 1]w = 1, w ∈ K ,

where dT = (cT 0), w = (z, u) (u ∈ R), and

K = cl“n

(z, u) : u > 0,zu

∈ Go”

,

a closed convex cone. Advantages of conic optimization are shown next.


Duality in conic optimizationLet K be a closed, convex, solid and pointed cone, K ∗ its dual. (Note: (Rn

+)∗ = Rn+)

Primal: vP := min{cTx : Ax = b, x ∈ K}Dual: vD := max{bTy : ATy + s = c, s ∈ K ∗}

For feasible (x , y , s),

cTx − bTy = (ATy + s)Tx − yTb = sTx .

Weak duality: vP ≥ vD. For feasible (x , y , s),

vP − vD = infx,y feasible

{cTx − bTy} = infx,s feasible

{sTx} ≥ 0.

Note: sTx = 0 no longer equivalent to x i si = 0 for all i .Farkas’s Lemma: Always holds only if MK is closed! (Quiz time!!)

Mu = du ∈ K

is consistent XOR

−MTv ∈ K ∗

dTv > 0is consistent.

[Equivalently : d ∈ MK ⇔ d ∈ (MK )∗∗]

Strong duality: If the primal (or the dual) is feasible and AK is closed (or someweaker condition), then

vP = vD =: vPD

Same proof (under the assumption that AK is closed) as in the case of linearoptimization!


Key to polynomial complexity: Self-concordance

Yurii Nesterov & Arkadi Nemirovski, 1994.Given a proper (i.e., closed, convex, solid and pointed) cone K ⊂ R

n,

F : int(K ) → R, three times differentiable, F ′′(x) invertible ∀x

is a ν-logarithmically homogeneous self-concordant barrier if

F (tx) = F (x) − ν log(t) ∀x ∈ int(K ), t > 0 (ν-logarithmically homogeneous)

|F ′′′(x)[h, h, h]| ≤ 2(F ′′(x)[h, h])3/2 ∀x ∈ int(K ), h ∈ Rn (self-concordant)

F (xk ) → ∞ whenever {xk} ⊂ int(K ) and xk → x ∈ bd(K ) (barrier)

Examples:

K = R+, F (x) = − log(x), ν = 1

K = Rn+, F (x) = −P log(x i), ν = n (linear optimization)

K = Sn+, F (x) = − log det x , ν = n (semidefinite programming)

This property is key to polynomial complexity. ν is the “complexity parameter”.

Define the “dual local norm”

‖s‖∗x,F = (F ′′(x)−1[s, s])1/2


Primal-dual interior-point methods for COLet β ∈ (0, 1). Let (xk , yk , sk ) ∈ N2(β), where (with µk =

xTk skν

)

N2(β) = {(xk , yk , sk ) : PD feasible, ‖sk + µk F ′(xk )‖∗xk ,F < βµk}.Perform a Newton iteration towards solving the square system, with σ ∈ (0, 1),

Ax = b (primal feasibility)

ATy + s = c (dual feasibility)

s + σµk F ′(x) = 0 (generalizes Xs = σµk e)

Thus, solve the Newton system

A∆xk = 0

AT∆yk + ∆sk = 0

µk F ′′(xk )∆xk + ∆sk = −σµkF ′(xk ) − sk

and set

xk+1 = xk + ∆xk

yk+1 = yk + ∆yk

sk+1 = sk + ∆sk

It turns out that with, e.g., σ = 1 − 1√ν

, this simple algorithm again converges, with thesame complexity bound as in the LO case, with ν replacing n.


Outline






Inexact function evaluation in conic optimization

Simon P. Schurr’s PhD dissertation. Advisors: Dianne O’Leary, Andre Tits.

In practice, it is often rather expensive, or even impossible, to evaluate F ′(x) and F ′′(x)exactly.

Instead, consider solving the approximate Newton system

A∆xk = 0

AT∆yk + ∆sk = 0

µk F2(xk )∆xk + ∆sk = −σµkF1(xk ) − sk

where the “errors” E1 := F1 − F ′ and E2 := F2 − F ′′ are small, in the sense that, forgiven ǫ1, ǫ2 > 0,

‖E1(x)‖∗x,F

‖F ′(x)‖∗x,F

≤ ǫ1 for all x ∈ int(K ),

‖F ′′(x)−1/2E2(x)F ′′(x)−1/2‖2 ≤ ǫ2 for all x ∈ int(K ).

Conditions were obtained, relating β, σ, ǫ1, ǫ2, and ν (allowing for arbitrarily large ν),under which the same polynomial complexity order is preserved.


Problems with many inequality constraints

Luke Winternitz, Stacey Nicholls, Pierre-Antoine Absil (CU Louvain), Dianne O’Leary,Andre Tits.

Main results so far are for linear optimization. Consider the dual form

max{bTy : ATy ≥ c}

with A ∈ Rm×n, with n ≫ m, i.e, the dual problem has many more (inequality)

constraints than variables. Such situation is common in practice (e.g., finely discretizedsemi-infinite problems).

Idea: At each iteration, select a small subset Q of seemingly critical constraints, andconstruct a search direction based on those only. The “constraint-reduction” schemeshould be implantable into popular variants of primal-dual interior-point algorithms.

Theoretical results: A simple rule for selecting Q was identified (which leaves room forheuristics), under which global convergence, and local quadratic convergence, areguaranteed, under nondegeneracy assumptions, for algorithms including Mehrotra’sPredictor Corrector algorithm, the current “champion” algorithm.


Problems with many inequality constraints: Numerical tests

m = 200, n = 40000

Generate: A, b, y0 ∼ N (0, 1), s0 ∼ U(0, 1). Normalize columns of A.

Set c = ATy0 + s0 and x0 = e.

CPU time Iterations


Summary

Simple primal-dual interior-points methods solve linear optimization problems inpolynomial time.

Convex optimization problems arise everywhere you look.

Convex optimization problems are readily recast into conic optimization problems.

Conic optimization problems are (formally) rather similar to linear optimizationproblems

“Minor” generalization of PD-IPMs for LO solve CO problems in polynomial time.


Thank You!

Slides are available from: www.isr.umd.edu/˜andre/ECE-Colloquium.pdf

A solution to the quiz. The following is an example with n = 3 and A ∈ R2×3:

K = {(x , y , z) : (x − z)2 + y2 ≤ z2}, A =

»

1 0 00 1 0

–

,

i.e., K is an (infinitely high) “ice cream cone” tangent to the vertical axis, and A is theorthogonal projection onto the (x , y) plane. It is readily checked that

AK = {(0, 0)} ∪ {(x , y) : x > 0},

which is a convex, solid cone (as it should) but, clearly, it is not closed. It is also notpointed, so its closure is not a proper cone.

A take-home quiz: Come up with an instance where AK is not closed (but K is closed),and in addition its closure is a proper cone (closed, convex, solid and pointed).


On Modern Convex Optimization The Interior-Point Revolutionandre/ECE-Colloquium.pdf · Disclaimer:...

Documents

Transcript of On Modern Convex Optimization The Interior-Point Revolutionandre/ECE-Colloquium.pdf · Disclaimer:...