On Modern Convex Optimization The Interior-Point Revolutionandre/ECE-Colloquium.pdf · Disclaimer:...
Transcript of On Modern Convex Optimization The Interior-Point Revolutionandre/ECE-Colloquium.pdf · Disclaimer:...
On Modern Convex OptimizationThe Interior-Point Revolution
Andre Tits
University of Maryland - College Park
May 16, 2008
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 1 / 24
My plan for today: Convey to you the pervasiveness, the power, the elegance, andI hope the excitement, of modern convex optimization.
Disclaimer: This is but a glimpse at convex optimization, with focus on simple keyideas rather than on the most efficient methods. Rigor is not guaranteed.
No need to take notes: Slides are available for download.
Good news, bad news.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 2 / 24
Outline
1 Convex Optimization: What? Why? Where?
2 Simplest Case: Linear Optimization
3 General Case: Conic Optimization
4 Recent Accomplishments at Maryland
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 3 / 24
Convex set, Convex function
A set S is convex if it contains every line segment who endpoints are in S.
A function f is convex if its epigraph is convex.
When S and f are convex, all stationary points for the problem
minimize f (x) subject to x ∈ S.
are global minimizers.
Convex optimization problems are “easy” to solve.
In the past two decades, a quasi-revolution took place in the development and analysisof effective and efficient methods for the solution of convex problems, in particularinterior-point methods, starting with the ground-breaking work of Nesterov andNemirovski (following that of Karmarkar in linear optimization).
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 4 / 24
Convex problems are ubiquitous
A few areas of application:
signal processing
VLSI design
estimation
communication networks
circuit design
automatic control
statistics
finance
etc.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 5 / 24
Example: VLSI gate sizing
Let t0 = 0, and let T > 0, si , si > 0, and aij > 0 be given.
The minimal area gate sizing problem can be written as
minimizesi ,ti
X
i
si s.t. tj + di(s) ≤ ti whenever j ∈ fanin(i), tn ≤ T , si ≤ si ≤ si ∀i
where si is the size (area) of gate i , ti is an upper bound on the delay from the primaryinput to the output of gate i , and
di(s) := ai0 +X
k∈fanout(i)
aiksk
si.
Equivalently,
minimizesi ,ti
X
i
si s.t. ai0 +X
k∈f.o.(i)
aik sk s−1i ≤ ti − tj , tn ≤ T , si ≤ si ≤ si
The cost and constraint functions being “posynomials”, this is a geometricprogramming problem. It is NOT convex, but it can be “convexified”, as we show next.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 6 / 24
VLSI gate sizing: Recasting into a convex problem
The change of variables xi := log(si) yields
minimizexi ,ti
X
i
exi s.t. ai0 +X
k∈f.o.(i)
aikexk−xi ≤ ti − tj , tn ≤ T , log(si) ≤ xi ≤ log(si)
which is convex. Because exponential functions tend to cause numerical difficulties,typically, the following reformulation (obtained by taking logs) is used instead:
minimize log
X
i
exi
!
s.t. log
0
@ai0 +X
k∈f.o.(i)
aikexk−xi
1
A− log(ti − tj) ≤ 0, tn ≤ T , log(si) ≤ xi ≤ log(si)
which is still convex.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 7 / 24
Outline
1 Convex Optimization: What? Why? Where?
2 Simplest Case: Linear Optimization
3 General Case: Conic Optimization
4 Recent Accomplishments at Maryland
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 8 / 24
Duality in linear optimization
Primal: vP := min{cTx : Ax = b, x ∈ Rn+}
Dual: vD := max{bTy : ATy + s = c, s ∈ Rn+}
For feasible (x , y , s), the “duality gap” is sTx :
cTx − bTy = (ATy + s)Tx − yTb = sTx .
Weak duality: vP ≥ vD. For feasible (x , y , s),
vP − vD = infx,y feasible
{cTx − bTy} = infx,s feasible
{sTx} ≥ 0.
Note: For feasible (x , s), sTx = 0 iff x i si = 0 for all i .
(Gyula, aka Julius) Farkas’s Lemma (1902; present form due to Albert Tucker):
Mu = du ∈ Rn
+is consistent XOR
−MTv ∈ Rn+
dTv > 0is consistent.
Strong duality: If the primal (or the dual) is feasible,
vP = vD =: vPD
Proof follows from Farkas’s lemma: see next slide (skipped).
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 9 / 24
Proof of strong duality in linear optimizationBelow, “icst” means “is consistent”. Given θ ∈ R, using Farkas’s lemma, we get
8
<
:
Ax = bx ≥ 0
cTx ≤ θicst ⇐⇒
8
<
:
»
A 0cT 1
– »
xt
–
=
»
bθ
–
(x , t) ∈ Rn+1+
XOR
8
>
>
<
>
>
:
−»
AT c0 1
– »
yu
–
∈ Rm+1+
ˆ
bT θ˜
»
yu
–
> 0icst.
Now suppose that the dual is feasible and θ < vP. Then the left-hand side isinconsistent, and hence the right-hand size must be consistent. That is, there exists(y , u) ∈ R
m+1 such that
ATy + cu ≤ 0, u ≤ 0, bTy + θu > 0.
If u = 0 then vD = +∞. (Indeed, take y = y0 + αy , with y0 feasible (we assumed dualfeasibility) and α → +∞.) Hence vD ≥ vP. On the other hand, if u < 0 then, w.l.o.g.,u = −1. We then get
ATy ≤ c and bTy > θ
The existence of such y implies that vD > θ. We conclude that vD > θ whenever vP > θ,again implying that vD ≥ vP. In view of weak duality, strong duality follows.
With primal feasibility assumed instead, the claim follows with a similar proof, startingby taking θ > vD and expressing “ATy ≤ c, bTy ≥ θ” as
AT(y+ − y−) + s = c, bT(y+ − y−) − t = θ, (s, t , y+, y−) ≥ 0
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 10 / 24
Primal-dual interior-point methods for LO: Set upNotation: Capital letters X , S, Xk , Sk denote diagonal matrices who entries are thecomponents of the corresponding lower case vector. E.g, X = diag(x i).
In view of strong duality, a triple (x , y , s) solves the primal-dual pair iff:
Ax = b, x ≥ 0 (primal feasibility) (also ∇y LD = 0)
ATy + s = c, s ≥ 0 (dual feasibility) (also ∇xLP = 0)
xTs = 0 (zero duality gap), i.e., Xs = 0 (complementary slackness)
Let β ∈ (0, 1). Let (xk , yk , sk ) ∈ N2(β), where
N2(β) = {(xk , yk , sk ) : PD feasible, ‖Xk sk − µk e‖ < βµk},
with µk :=xT
k skn (the “duality measure”).
Key idea: Focusing on the equalities in the optimality condition, perform a Newtoniteration towards solving the square system, with σ ∈ (0, 1),
Ax = b
ATy + s = c
Xs = σµk e
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 11 / 24
Primal-dual interior-point methods for LO: “Short Step” method
Why Xs = σµke rather than Xs = 0? With σ close enough to 1, this will make the fullNewton step acceptable globally.
Thus, solve the Newton system
A∆xk = 0
AT∆yk + ∆sk = 0
Xk sk + Xk∆sk + Sk∆xk = σµk e.
and set
xk+1 = xk + ∆xk
yk+1 = yk + ∆yk
sk+1 = sk + ∆sk
It turns out that, with an appropriate selection of σ (e.g., σ = 1 − 1√n), this exceedingly
simple algorithm has rather strong convergence properties.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 12 / 24
Primal-dual interior-point methods for LO: Polynomial complexity
The following two results follow:
1 If (xk , sk ) is strictly feasible, then µk+1 :=xT
k+1sk+1n = σµk (immediate).
2 If (xk , yk , sk ) ∈ N2(β), then (xk+1, yk+1, sk+1) ∈ N2(β) (takes some work). Inparticular, (xk+1, sk+1) is strictly feasible.
It follows that, if (x0, y0, s0) ∈ N2(β), then for all k , (xk , yk , sk ) ∈ N2(β) and
cTxk − vPD ≤ xTk sk = nµk = σk nµ0 = σk xT
0 s0,
vPD − bTyk ≤ xTk sk = nµk = σk nµ0 = σkxT
0 s0,
a global linear convergence rate.
Hence, given ǫ > 0, cTxk < vPD + ǫ and bTyk > vPD − ǫ are achieved after
at most√
n log(xT0 s0/ǫ) iterations.
Since every iteration (solution of an (2n + m) × (2n + m) linear system) takes at mostO(n3) flops, the total number of flops to ǫ-solution is at most O(n7/2), which ispolynomial in n.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 13 / 24
Outline
1 Convex Optimization: What? Why? Where?
2 Simplest Case: Linear Optimization
3 General Case: Conic Optimization
4 Recent Accomplishments at Maryland
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 14 / 24
Why conic optimization?
Every convex optimization problem can be recast as a conic optimization problem.(And many convex problems arise naturally as conic optimization problems.)
Considerminimize f (x) subject to x ∈ S,
with f and S convex. This problem is equivalently written as
minimize t subject to f (x) ≤ t , x ∈ S.
where t ∈ R is an additional optimization variable. Defining z = (x , t), we can recastthis problem as
minimize cTz subject to z ∈ G,
where c = (0 . . . 01)T and G is a convex set. This can be further recast as
minimize dTw subject to [0 . . . 0 1]w = 1, w ∈ K ,
where dT = (cT 0), w = (z, u) (u ∈ R), and
K = cl“n
(z, u) : u > 0,zu
∈ Go”
,
a closed convex cone. Advantages of conic optimization are shown next.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 15 / 24
Duality in conic optimizationLet K be a closed, convex, solid and pointed cone, K ∗ its dual. (Note: (Rn
+)∗ = Rn+)
Primal: vP := min{cTx : Ax = b, x ∈ K}Dual: vD := max{bTy : ATy + s = c, s ∈ K ∗}
For feasible (x , y , s),
cTx − bTy = (ATy + s)Tx − yTb = sTx .
Weak duality: vP ≥ vD. For feasible (x , y , s),
vP − vD = infx,y feasible
{cTx − bTy} = infx,s feasible
{sTx} ≥ 0.
Note: sTx = 0 no longer equivalent to x i si = 0 for all i .Farkas’s Lemma: Always holds only if MK is closed! (Quiz time!!)
Mu = du ∈ K
is consistent XOR
−MTv ∈ K ∗
dTv > 0is consistent.
[Equivalently : d ∈ MK ⇔ d ∈ (MK )∗∗]
Strong duality: If the primal (or the dual) is feasible and AK is closed (or someweaker condition), then
vP = vD =: vPD
Same proof (under the assumption that AK is closed) as in the case of linearoptimization!
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 16 / 24
Key to polynomial complexity: Self-concordance
Yurii Nesterov & Arkadi Nemirovski, 1994.Given a proper (i.e., closed, convex, solid and pointed) cone K ⊂ R
n,
F : int(K ) → R, three times differentiable, F ′′(x) invertible ∀x
is a ν-logarithmically homogeneous self-concordant barrier if
F (tx) = F (x) − ν log(t) ∀x ∈ int(K ), t > 0 (ν-logarithmically homogeneous)
|F ′′′(x)[h, h, h]| ≤ 2(F ′′(x)[h, h])3/2 ∀x ∈ int(K ), h ∈ Rn (self-concordant)
F (xk ) → ∞ whenever {xk} ⊂ int(K ) and xk → x ∈ bd(K ) (barrier)
Examples:
K = R+, F (x) = − log(x), ν = 1
K = Rn+, F (x) = −P log(x i), ν = n (linear optimization)
K = Sn+, F (x) = − log det x , ν = n (semidefinite programming)
This property is key to polynomial complexity. ν is the “complexity parameter”.
Define the “dual local norm”
‖s‖∗x,F = (F ′′(x)−1[s, s])1/2
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 17 / 24
Primal-dual interior-point methods for COLet β ∈ (0, 1). Let (xk , yk , sk ) ∈ N2(β), where (with µk =
xTk skν
)
N2(β) = {(xk , yk , sk ) : PD feasible, ‖sk + µk F ′(xk )‖∗xk ,F < βµk}.Perform a Newton iteration towards solving the square system, with σ ∈ (0, 1),
Ax = b (primal feasibility)
ATy + s = c (dual feasibility)
s + σµk F ′(x) = 0 (generalizes Xs = σµk e)
Thus, solve the Newton system
A∆xk = 0
AT∆yk + ∆sk = 0
µk F ′′(xk )∆xk + ∆sk = −σµkF ′(xk ) − sk
and set
xk+1 = xk + ∆xk
yk+1 = yk + ∆yk
sk+1 = sk + ∆sk
It turns out that with, e.g., σ = 1 − 1√ν
, this simple algorithm again converges, with thesame complexity bound as in the LO case, with ν replacing n.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 18 / 24
Outline
1 Convex Optimization: What? Why? Where?
2 Simplest Case: Linear Optimization
3 General Case: Conic Optimization
4 Recent Accomplishments at Maryland
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 19 / 24
Inexact function evaluation in conic optimization
Simon P. Schurr’s PhD dissertation. Advisors: Dianne O’Leary, Andre Tits.
In practice, it is often rather expensive, or even impossible, to evaluate F ′(x) and F ′′(x)exactly.
Instead, consider solving the approximate Newton system
A∆xk = 0
AT∆yk + ∆sk = 0
µk F2(xk )∆xk + ∆sk = −σµkF1(xk ) − sk
where the “errors” E1 := F1 − F ′ and E2 := F2 − F ′′ are small, in the sense that, forgiven ǫ1, ǫ2 > 0,
‖E1(x)‖∗x,F
‖F ′(x)‖∗x,F
≤ ǫ1 for all x ∈ int(K ),
‖F ′′(x)−1/2E2(x)F ′′(x)−1/2‖2 ≤ ǫ2 for all x ∈ int(K ).
Conditions were obtained, relating β, σ, ǫ1, ǫ2, and ν (allowing for arbitrarily large ν),under which the same polynomial complexity order is preserved.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 20 / 24
Problems with many inequality constraints
Luke Winternitz, Stacey Nicholls, Pierre-Antoine Absil (CU Louvain), Dianne O’Leary,Andre Tits.
Main results so far are for linear optimization. Consider the dual form
max{bTy : ATy ≥ c}
with A ∈ Rm×n, with n ≫ m, i.e, the dual problem has many more (inequality)
constraints than variables. Such situation is common in practice (e.g., finely discretizedsemi-infinite problems).
Idea: At each iteration, select a small subset Q of seemingly critical constraints, andconstruct a search direction based on those only. The “constraint-reduction” schemeshould be implantable into popular variants of primal-dual interior-point algorithms.
Theoretical results: A simple rule for selecting Q was identified (which leaves room forheuristics), under which global convergence, and local quadratic convergence, areguaranteed, under nondegeneracy assumptions, for algorithms including Mehrotra’sPredictor Corrector algorithm, the current “champion” algorithm.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 21 / 24
Problems with many inequality constraints: Numerical tests
m = 200, n = 40000
Generate: A, b, y0 ∼ N (0, 1), s0 ∼ U(0, 1). Normalize columns of A.
Set c = ATy0 + s0 and x0 = e.
CPU time Iterations
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 22 / 24
Summary
Simple primal-dual interior-points methods solve linear optimization problems inpolynomial time.
Convex optimization problems arise everywhere you look.
Convex optimization problems are readily recast into conic optimization problems.
Conic optimization problems are (formally) rather similar to linear optimizationproblems
“Minor” generalization of PD-IPMs for LO solve CO problems in polynomial time.
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 23 / 24
Thank You!
Slides are available from: www.isr.umd.edu/˜andre/ECE-Colloquium.pdf
A solution to the quiz. The following is an example with n = 3 and A ∈ R2×3:
K = {(x , y , z) : (x − z)2 + y2 ≤ z2}, A =
»
1 0 00 1 0
–
,
i.e., K is an (infinitely high) “ice cream cone” tangent to the vertical axis, and A is theorthogonal projection onto the (x , y) plane. It is readily checked that
AK = {(0, 0)} ∪ {(x , y) : x > 0},
which is a convex, solid cone (as it should) but, clearly, it is not closed. It is also notpointed, so its closure is not a proper cone.
A take-home quiz: Come up with an instance where AK is not closed (but K is closed),and in addition its closure is a proper cone (closed, convex, solid and pointed).
Andre Tits (UMD) On Modern Convex Optimization May 16, 2008 24 / 24