Duality Theory of Constrained...

52
Duality Theory of Constrained Optimization Robert M. Freund April, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1

Transcript of Duality Theory of Constrained...

Page 1: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

Duality Theory of Constrained Optimization

Robert M. Freund

April, 2014

c⃝2014 Massachusetts Institute of Technology. All rights reserved.

1

Page 2: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

2

1 The Practical Importance of Duality

Duality is pervasive in nonlinear (and linear) optimization models in a widevariety of engineering and mathematical settings. Some elementary exam-ples of models where duality is plays an important role are:

Electrical networks. The current flows are “primal variables” and thevoltage differences are the “dual variables” that arise in consideration ofoptimization (and equilibrium) in electrical networks.

Economic markets. The “primal” variables are production levels andconsumption levels, and the “dual” variables are prices of goods and services.

Structural design. The tensions on the beams are “primal” variables, andthe nodal displacements are the “dual” variables.

Nonlinear (and linear) duality is extremely useful both as a conceptual aswell as a computational tool. For example, dual problems and their solutionsare used in connection with:

(a) Identifying near-optimal solutions. A good dual solution can beused to bound the values of primal solutions, and so can be used toactually identify when a primal solution is optimal or near-optimal.

(b) Proving optimality. Using a strong duality theorem, one can proveoptimality of a primal solution by constructing a dual solution withthe same objective function value.

(c) Sensitivity analysis of the primal problem. The dual variable ona constraint represents the incremental change in the optimal solutionvalue per unit increase in the right-hand-side (RHS) of the constraint.

(d) Karush-Kuhn-Tucker (KKT) conditions. The optimal solutionto the dual problem is a vector of KKT multipliers.

(e) Convergence of algorithms. The dual problem is often used in theconvergence analysis of algorithms.

(f) Discovering and Exploiting Problem Structure. Quite often,the dual problem has some useful mathematical, geometric, or compu-

Page 3: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

3

tational structure that can exploited in computing solutions to boththe primal and the dual problem.

2 The Dual Problem

2.1 Constructing the Dual Problem

Recall the basic constrained optimization model:

OP : minimumx f(x)

s.t. g1(x) ≤ 0· =· ≥...

gm(x) ≤ 0

x ∈ X .

In this model, we have f(x) : Rn 7→ R and gi(x) : Rn 7→ R, i = 1, . . . ,m.

We can always convert the constraints involving gi(x) to have the format“gi(x) ≤ 0” , and so we will now presume that our problem is of the form:

Page 4: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

4

OP : z∗ = minimumx f(x)

s.t. g1(x) ≤ 0

......

...

gm(x) ≤ 0

x ∈ X .

We will refer to X as the “ground-set”. Whereas in previous developmentsX was often assumed to be Rn, in our study of duality it will be useful topresume that X itself is composed of a structural part of the feasible regionof the optimization problem. A typical example is when the feasible regionof our problem of interest is the intersection of k inequality constraints:

F = {x ∈ Rn : gi(x) ≤ 0, i = 1, . . . , k} .

Depending on the structural properties of the functions gi(·), we might wishre-order the constraints so that the first m of these k constraints (whereof course m ≤ k) are kept explicitly as inequalities, and the last k − mconstraints are formulated into the ground-set

X := {x ∈ Rn : gi(x) ≤ 0, i = m+ 1, . . . , k} .

Then it follows that:

F = {x ∈ Rn : gi(x) ≤ 0, i = 1, . . . ,m, } ∩X ,

where now there are only m explicit inequality constraints.

More generally, the ground-set X could be of any form, including the fol-lowing:

Page 5: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

5

(i) X = {x ∈ Rn : x ≥ 0},

(ii) X = K where K is a convex cone,

(iii) X = {x | gi(x) ≤ 0, i = m+ 1, . . . ,m+ k},

(iv) X ={x | x ∈ Zn

+

},

(v) X = {x ∈ Rn : Nx = b, x ≥ 0} where “Nx = b” models flows in anetwork,

(vi) X = {x | Ax ≤ b},

(vii) X = Rn .

We now describe how to construct the dual problem of OP. The first step isto form the Lagrangian function. For a nonnegative vector u, the Lagrangianfunction is defined simply as follows:

L(x, u) := f(x) + uT g(x) = f(x) +

m∑i=1

uigi(x) .

The Lagrangian essentially takes the constraints out of the description of thefeasible region of the model, and instead places them in the the objectivefunction with multipliers ui for i = 1, . . . ,m. It might be helpful to think ofthese multipliers as costs or prices or penalties.

The second step is to construct the dual function L∗(u), which is defined asfollows:

L∗(u) := minimumx L(x, u) = minimumx f(x) + uT g(x)

s.t. x ∈ X s.t. x ∈ X .(1)

Page 6: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

6

Note that L∗(u) is defined for any given u. In order for the dual problem tobe computationally viable, it will be necessary to be able to compute L∗(u)efficiently for any given u. Put another way, it will be necessary to be ableto solve the optimization problem (1) efficiently for any given u.

The third step is to write down the dual problem D, which is defined asfollows:

D : v∗ = maximumu L∗(u)

s.t. u ≥ 0 .

2.2 Summary: Steps in the Construction of the Dual Prob-lem

Here is a summary of the process of constructing the dual problem of OP.The starting point is the primal problem:

OP : z∗ = minimumx f(x)

s.t. gi(x) ≤ 0, i = 1, . . . ,m

x ∈ X .

The dual problem D is constructed in the following three-step procedure:

1. Create the Lagrangian:

L(x, u) := f(x) + uT g(x) .

Page 7: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

7

2. Create the dual function:

L∗(u) := minimumx f(x) + uT g(x)

s.t. x ∈ X .

3. Create the dual problem:

D : v∗ = maximumu L∗(u)

s.t. u ≥ 0 .

2.3 Example: the Dual of a Linear Problem

Consider the linear optimization problem:

LP : minimumx cTx

s.t. Ax ≥ b

x ≥ 0 .

We can rearrange the inequality constraints and re-write LP as:

LP : minimumx cTx

s.t. b−Ax ≤ 0

x ≥ 0 .

Page 8: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

8

Let us define X := {x ∈ Rn : x ≥ 0} and identify the linear inequality con-straints “b−Ax ≤ 0” as the constraints “gi(x) ≤ 0, i = 1, . . . ,m”. Thereforethe Lagrangian for this problem is:

L(x, u) = cTx+ uT (b−Ax)

= uT b+ (c−ATu)Tx .

The dual function L∗(u) is defined to be:

L∗(u) = minx∈X L(x, u)

= minx≥0 L(x, u) .

Let us now evaluate L∗(u). For a given value of u, L∗(u) is evaluated asfollows:

L∗(u) = minx≥0 L(x, u)

= minx≥0 uT b+ (c−ATu)Tx

=

uT b if ATu ≤ c

−∞ if ATu ≤ c .

The dual problem (D) is defined to be:

(D) v∗ = maxu≥0 L∗(u) ,

and is therefore constructed as:

(D) v∗ = maxu uT b

s.t. ATu ≤ c

u ≥ 0 .

Page 9: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

9

2.4 Remarks on Conventions involving ±∞

It is often the case when constructing dual problems that functions suchas L∗(u) takes the values of ±∞ for certain values of u. Here we discussconventions in this regard.

Suppose we have an optimization problem:

(P) : z∗ = maximumx f(x)

s.t. x ∈ S .

We could define the function:

h(x) :=

f(x) if x ∈ S

−∞ if x /∈ S .

Then we can rewrite our problem as:

(P) : z∗ = maximumx h(x)

s.t. x ∈ Rn .

Conversely, suppose that we have a function k(·) that takes on the value−∞ outside of a certain region S, but that k(·) is finite for all x ∈ S. Thenthe problem:

(P) : z∗ = maximumx k(x)

s.t. x ∈ Rn

is equivalent to:

(P) : z∗ = maximumx k(x)

s.t. x ∈ S .

Similar logic applies to minimization problems over domains where the func-tion values might take on the value +∞.

Page 10: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

10

3 More Examples of Dual Constructions of Opti-mization Problems

3.1 The Dual of a Convex Quadratic Problem

Consider the following convex quadratic optimization problem:

QP : minimumx12x

TQx+ cTx

s.t. Ax ≥ b ,

where Q is symmetric and positive definite.

To construct a dual of this problem, let us re-write the inequality constraintsas b−Ax ≤ 0 and dualize these constraints. We construct the Lagrangian:

L(x, u) = 12x

TQx+ cTx+ uT (b−Ax)

= uT b+ (c−ATu)Tx+ 12x

TQx .

The dual function L∗(u) is:

L∗(u) = minx∈Rn L(x, u)

= minx∈Rn uT b+ (c−ATu)Tx+ 12x

TQx

= uT b+minx∈Rn(c−ATu)Tx+ 12x

TQx .

For a given value of u, let x be the optimal value of x in this last expression.Then since the expression is a convex quadratic function of x, it follows thatx must satisfy:

0 = ∇xL(x, u) = (c−ATu) +Qx ,

whereby x is given by:

Page 11: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

11

x = −Q−1(c−ATu) .

Substituting this value of x into the Lagrangian, we obtain:

L∗(u) = L(x, u) = uT b− 12(c−ATu)TQ−1(c−ATu) .

The dual problem (D) is defined to be:

(D) v∗ = maxu≥0 L∗(u) ,

and is therefore constructed as:

(D) v∗ = maxu uT b− 12(c−ATu)TQ−1(c−ATu)

s.t. u ≥ 0 .

3.2 Remarks on Problems with Different Formats of Con-straints

Suppose that our problem has some inequality and equality constraints.Then just as in the case of linear optimization, we assign nonnegative dualvariables ui to constraints of the form gi(x) ≤ 0, unrestricted dual variablesui to equality constraints gi(x) = 0, and non-positive dual variables ui toconstraints of the form gi(x) ≥ 0. For example, suppose our problem is:

OP : minimumx f(x)

s.t. gi(x) ≤ 0 , i ∈ L

gi(x) ≥ 0 , i ∈ G

gi(x) = 0 , i ∈ E

x ∈ X .

Page 12: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

12

Then the Lagrangian takes the form:

L(x, u) := f(x) + uT g(x) = f(x) +∑i∈L

uigi(x) +∑i∈G

uigi(x) +∑i∈E

uigi(x) ,

and the dual function L∗(u) is constructed as:

L∗(u) := minimumx f(x) + uT g(x)

s.t. x ∈ X .

The dual problem is then defined to be:

D : v∗ = maximumu L∗(u)

s.t. ui ≥ 0 , i ∈ L

ui ≤ 0 , i ∈ G

ui ∈ R , i ∈ E .

For simplicity, we will typically presume that the constraints of OP are ofthe form gi(x) ≤ 0 . This is only for simplicity of exposition, as all of theresults discussed herein extend to the general case just described.

3.3 The Dual of a Log-Barrier Problem

Consider the following logarithmic barrier problem:

Page 13: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

13

BP : minimumx1,x2,x3 5x1 + 7x2 − 4x3 −3∑

j=1ln(xj)

s.t. x1 + 3x2 + 12x3 = 37

x1 > 0, x2 > 0, x3 > 0 .

Let us define the ground-set as X = {x ∈ Rn : xj > 0, j = 1, . . . , n}, and letus dualize on the single equality constraint. The Lagrangian function takeson the form:

L(x, u) = 5x1 + 7x2 − 4x3 −3∑

j=1ln(xj) + u(37− x1 − 3x2 − 12x3)

= 37u+ (5− u)x1 + (7− 3u)x2 + (−4− 12u)x3 −3∑

j=1ln(xj) .

The dual function L∗(u) is constructed as:

L∗(u) = minx∈X L(x, u)

= minx>0 37u+ (5− u)x1 + (7− 3u)x2 + (−4− 12u)x3 −3∑

j=1ln(xj)

= 37u+minx>0 (5− u)x1 + (7− 3u)x2 + (−4− 12u)x3 −3∑

j=1ln(xj) .

Now notice that the optimization problem above separates into three uni-variate optimization problems of a linear function minus a logarithm termfor each of the three positive variables x1, x2, and x3. Examining x1, it

Page 14: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

14

holds that the minimization value will be −∞ if (5 − u) ≤ 0, as we couldset x1 arbitrarily large. When (5 − u) > 0, the problem of minimizing(5− u)x1 − ln(x1) is a convex optimization problem whose solution is givenby setting the first derivative with respect x1 equal to zero. This meanssolving:

(5− u)− 1

x1= 0 ,

or in other words, setting

x1 =1

(5− u).

Substituting this value of x1, we obtain:

(5− u)x1 − ln(x1) = 1− ln

(1

5− u

)= 1 + ln(5− u) .

Using parallel logic for the other two variables, we arrive at:

L∗(u) =

37u+ 3 + ln(5− u) + ln(7− 3u) + ln(−4− 12u) if u < −1/3

−∞ otherwise.

Notice that L∗(u) is finite whenever 5−u > 0, 7−3u > 0, and −4−12u > 0,but that these three inequalities in u are equivalent to the single inequalityu < −1/3 .

The dual problem (D) is defined to be:

(D) v∗ = maxu∈R L∗(u) ,

and is therefore constructed as:

Page 15: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

15

(D) v∗ = maxu 37u+ 3 + ln(5− u) + ln(7− 3u) + ln(−4− 12u)

s.t. u < −1/3 .

3.4 The Dual of a Standard Conic Primal Problem

We consider the following convex optimization problem in standard conicform:

CP : z∗ = minimumx cTx

s.t. Ax = b

x ∈ K ,

where K is a closed convex cone.

Let K∗ denote the dual cone of the closed convex cone K ⊂ Rn, defined by:

K∗ := {y ∈ Rn | yTx ≥ 0 for all x ∈ K} .

Recall the following properties of K∗:

Proposition 3.1 If K is a nonempty closed convex cone, then

1. K∗ is a nonempty closed convex cone, and

2. (K∗)∗ = K .

Let us consider K to be the ground-set (X = K) and we therefore dualizeon the constraints “Ax = b” of CP, and consider the Lagrangian:

L(x, u) := cTx+ uT (b−Ax) .

Page 16: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

16

The dual function is:

L∗(u) := minx∈K

L(x, u) = minx∈K

cTx+ uT (b−Ax) .

It then follows that:

L∗(u) = minx∈K cTx+ uT (b−Ax)

= uT b+minx∈K (c−ATu)Tx

=

bTu if c−ATu ∈ K∗

−∞ if c−ATu /∈ K∗ .

We therefore write down the dual problem as:

CD : v∗ = maximumu,s bTu

s.t. ATu+ s = c

s ∈ K∗ .

Remark 1 It is a straightforward exercise to show that the conic dual ofCD is CP.

4 The Column Geometry of the Primal and DualProblems

In this section we develop a simple interpretation of the dual problem in thespace of “resources and costs”, as follows.

For a given x ∈ X, the primal problem specifies an array of resourcesand costs associated with x, namely:

Page 17: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

17

(sz

)=

s1s2...smz

=

g1(x)g2(x)...

gm(x)f(x)

=

(g(x)f(x)

).

We can think of each of these arrays as an array of resources and cost inRm+1. Define the set I:

I :={(s, z) ∈ Rm+1 | there exists x ∈ X for which s ≥ g(x) and z ≥ f(x)

}.

This region is illustrated in Figure 1.

z

s

-u

H(u,α)

I

Figure 1: The set I.

Proposition 4.1 If X is a convex set, and f(·), g1(·), . . . , gm(·) are convexfunctions on X, then I is a convex set.

Page 18: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

18

Now define:

Hu,α := {(s, z) ∈ Rm+1 : uT s+ z = α} .

We say that Hu,α is a lower support of I if

uT s+ z ≥ α for all (s, z) ∈ I .

Let us also define the line:

L := {(s, z) ∈ Rm+1 : s = 0} .

Note thatHu,α ∩ L = {(0, α)} .

Now consider the problem of determining the hyperplane Hu,α that is alower support of I and whose intersection with L is the highest, that is,whose value α is maximized. This problem is:

Page 19: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

19

maximumu,α α

s.t. Hu,α is a lower support of I

= maximumu,α α

s.t. uT s+ z ≥ α for all (s, z) ∈ I

= maximumu≥0,α α

s.t. uT s+ z ≥ α for all (s, z) ∈ I

= maximumu≥0,α α

s.t. uT g(x) + f(x) ≥ α for all x ∈ X

Page 20: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

20

= maximumu≥0,α α

s.t. L(x, u) ≥ α for all x ∈ X

= maximumu≥0,α α

s.t. minx∈X L(x, u) ≥ α

= maximumu≥0,α α

s.t. L∗(u) ≥ α

= maximumu L∗(u)

s.t. u ≥ 0 .

This last expression is exactly the dual problem (D), whose value is givenby v∗. We summarize this observation as:

Remark 1 The dual problem corresponds to finding a hyperplane Hu,α thatis a lower support of I, and whose intersection with L is the highest. Thishighest value is exactly the value of the dual problem, namely v∗.

Page 21: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

21

5 The Dual is a Concave Maximization Problem

Theorem 5.1 The dual function L∗(u) is a concave function on its domainS = {u ∈ Rm : u ≥ 0}.

Proof: Let u1 ≥ 0 and u2 ≥ 0 be two values of the dual variables, and letu = λu1 + (1− λ)u2, where λ ∈ [0, 1]. Then

L∗(u) = minx∈X f(x) + uT g(x)

= minx∈X λ[f(x) + uT1 g(x)

]+ (1− λ)

[f(x) + uT2 g(x)

]≥ λ

[minx∈X f(x) + uT1 g(x)

]+ (1− λ)

[minx∈X(f(x) + uT2 g(x)

]= λL∗(u1) + (1− λ)L∗(u2) .

Therefore L∗(u) is a concave function.

6 Weak Duality

Let z∗ and v∗ be the optimal values of the primal and the dual problems:

OP : z∗ = minimumx f(x)

s.t. gi(x) ≤ 0, i = 1, . . . ,m

x ∈ X ,

D : v∗ = maximumu L∗(u)

s.t. u ≥ 0 .

Page 22: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

22

Theorem 6.1 Weak Duality Theorem: If x is feasible for OP and u isfeasible for D, then

f(x) ≥ L∗(u) .

In particular,z∗ ≥ v∗ .

Proof: If x is feasible for OP and u is feasible for D, then

f(x) ≥ f(x) + uT g(x) ≥ minx∈X

f(x) + uT g(x) = L∗(u) .

Therefore z∗ ≥ v∗.

Corollary 6.1 If x is feasible for OP and u ≥ 0 is feasible for D, andf(x) = L∗(u), then x and u are optimal solutions of OP and D, respectively.

Corollary 6.2 If z∗ = −∞, then D has no feasible solution.

Corollary 6.3 If v∗ = +∞, then OP has no feasible solution.

A strong duality theorem cannot necessarily be established for a generalnonlinear optimization problem without adding assumptions regarding con-vexity and the existence of a Slater point. This will be developed in Section8.

7 Saddlepoint Optimality Criteria

The pair (x, u) is called a saddlepoint of the Lagrangian L(x, u) if x ∈ X,u ≥ 0, and

L(x, u) ≤ L(x, u) ≤ L(x, u) for all x ∈ X and u ≥ 0 .

Theorem 7.1 A pair (x, u) is a saddlepoint of the Lagrangian if and onlyif the following three conditions are satisfied:

Page 23: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

23

(i) L(x, u) = L∗(u) ,

(ii) g(x) ≤ 0, and

(iii) uT g(x) = 0 .

Furthermore, (x, u) is a saddlepoint of L(x, u) if and only if the followingtwo conditions are satisfied:

(I) x and u are, respectively, optimal solutions of OP and D, and

(II) there is no duality gap, i.e., z∗ = f(x) = L∗(u) = v∗ .

Proof: Suppose that (x, u) is a saddlepoint. By definition of the saddle-point, condition (i) must be true. Also by definition of the saddlepoint, forany u ≥ 0,

f(x) + uT g(x) ≥ f(x) + uT g(x) .

This implies that g(x) ≤ 0, since otherwise the above inequality can beviolated by picking u ≥ 0 appropriately. This establishes (ii). Taking u = 0,we get uT g(x) ≥ 0; hence, uT g(x) = 0, establishing (iii).

Also, note that from the above observations that x and u are feasiblefor OP and D, respectively, and f(x) = L(x, u) = L∗(u), which impliesthat they are optimal solutions of their respective problems, and there is noduality gap, thus establishing (I) and (II).

Suppose now that x and u ≥ 0 satisfy conditions (i), (ii), and (iii). ThenL(x, u) ≤ L(x, u) for all x ∈ X by (i). Furthermore, for any u ≥ 0 we have:

L(x, u) = f(x) ≥ f(x) + uT g(x) = L(x, u) .

Thus (x, u) is a saddlepoint of L(x, u).

Finally, suppose x and u are optimal solutions of OP and D with noduality gap. Primal and dual feasibility implies that

L∗(u) ≤ L(x, u) = f(x) + uT g(x) ≤ f(x) .

Since there is no duality gap, equality holds throughout, implying conditions(i), (ii), and (iii), and hence (x, u) is a saddlepoint of L(x, u).

Page 24: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

24

Remark 2 Note that up until this point, we have made no assumptionsabout the functions f(x), g1(x), . . . , gm(x) or the structure of the set X.

Suppose that our constrained optimization problem OP is a convex pro-gram, namely f(x), g1(x), . . . , gm(x) are convex functions, and that X = Rn:

OP : z∗ = minimumx f(x)

s.t. g(x) ≤ 0

x ∈ Rn .

Suppose that x together with u satisfy the KKT conditions:

(i) ∇f(x) + uT∇g(x) = 0

(ii) g(x) ≤ 0

(iii) u ≥ 0

(iv) uT g(x) = 0 .

Then because f(x), g1(x), . . . , gm(x) are convex functions, condition (i) isequivalent to “L(x, u) = minx∈X L(x, u) = L∗(u)” for X = Rn. Thereforethe KKT conditions imply conditions (i), (ii), and (iii) of Theorem 7.1, andso we have:

Theorem 7.2 Suppose X = Rn, and f(x), g1(x), . . . , gm(x) are convex dif-ferentiable functions, and x together with u satisfy the KKT conditions.Then x and u are optimal solutions of OP and D, with no duality gap.

Page 25: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

25

8 Strong Duality for Convex Optimization Prob-lems

In this section we consider the following convex problem:

OPE : z∗ = minimumx f(x)

s.t. gL(x) ≤ 0

gE(x) = 0

x ∈ X ,

where we assume here that f(x) and gi(x) are convex functions for i ∈ L,gi(x) are linear functions (and so can be written as gi(x) = bi − Aix) fori ∈ E, and X is a convex set. We use the notation gL(x) for the vector ofvalues of gi(x) for i ∈ L and similarly gE(x) for the vector of values of gi(x)for i ∈ E. With this notation we have:

gE(x) = 0 ⇔ Ax = b .

We further assume with no loss of generality that the matrix A has full rowrank. (If A does not have full row rank, we can eliminate dependent rowsto yield a new system with full row rank.) It follows from the full row rankof A that if ATu = 0, then u = 0.

Note that the standard conic optimization problem CP is a special case ofOPE; this follows by setting f(x) := cTx, X := K, and having a vacuousset of convex inequalities (more formally, L := ∅).

Herein we explore and state sufficient conditions that will guarantee thatthe dual problem will have an optimal solution with no duality gap. Forcompleteness, let us construct the dual problem for this format. We formthe Lagrangian:

L(x, u) := f(x) + uT g(x) = f(x) + uTLgL(x) + uTEgL(x) ,

where uL and uE are vectors of multipliers indexed over the inequality con-straints and equality constraints, respectively. Then the dual function L∗(u)

Page 26: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

26

is:

L∗(u) := minimumx f(x) + uT g(x)

s.t. x ∈ X .

The dual problem is then defined to be:

D : v∗ = maximumu L∗(u)

s.t. uL ≥ 0

uE ∈ R|E| ,

where |E| denotes the number of indices in the index set E.

8.1 The Perturbation Function

We now introduce the concept of perturbing the RHS vector of the optimiza-tion problem OPE. We perturb the RHS of OPE by an amount y = (yL, yE),and obtain the new problem OPEy defined as:

OPEy : z(y) = minimumx f(x)

s.t. gL(x) ≤ yL

gE(x) = yE

x ∈ X .

We call OPEy the perturbed primal problem and z(y) the perturbation func-tion. Note that our original problem OPE is just OPE0 and that z∗ = z(0).

Page 27: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

27

Next define:

Y := {(yL, yE) : there exists x ∈ X for which gL(x) ≤ yL and gE(x) = yE} .

Lemma 8.1 Y is a convex set, and z(y) is a convex function whose domainis Y .

Proof: Let y1, y2 ∈ Y and let y3 = λy1+(1−λ)y2 where λ ∈ [0, 1]. Let x1 befeasible for OPEy1 and x2 be feasible for OPEy2 , and let x3 = λx1+(1−λ)x2.Then x3 be feasible for OPEy3 , whereby y3 ∈ Y , proving that Y is convex.

To show that z(y) is a convex function, let y1, y2, y3 be as given above.The aim is to show that z(y3) ≤ λz(y1) + (1 − λ)z(y2). First assume thatz(y1) and z(y2) are finite. For any ε > 0, there exist x1 and x2 feasible forOPEy1 and OPEy2 , respectively, and f(x1) ≤ z(y1)+ε and f(x2) ≤ z(y2)+ε.Now let x3 = λx1 + (1− λ)x2. Then x3 is feasible for OPEy3 , and

z(y3) ≤ f(x3) ≤ λf(x1) + (1− λ)f(x2) ≤ λ(z(y1) + ε) + (1− λ)(z(y2) + ε)

= λz(y1) + (1− λ)z(y2) + ε .

As this is true for any ε > 0, it follows that z(y3) ≤ λz(y1) + (1− λ)z(y2),and so z(y) is convex. If z(y1) and/or z(y2) is not finite, the proof goesthrough with appropriate modifications.

The perturbation function z(y) is illustrated in Figure 2. One immediateconsequence of the convexity of z(·) is the existence of a subgradient of z(y)when y ∈ intY :

Corollary 8.1 For every y ∈ intY , there exists a subgradient of z(·) aty = y.

8.2 Slater Points and Strong Duality

A vector x0 is called a Slater point for OPE if x0 satisfies:

• gL(x0) < 0 ,

Page 28: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

28

z

y

-u

H(u,α)

z(y)

0

Figure 2: The perturbation function z(y).

• gE(x0) = 0, which is the same as Ax0 = b , and

• x ∈ intX .

Note that a Slater point is feasible for OPE and satisfies all constraints“strictly” in the sense of being interior, except for the equality constraintswhich by definition must be satisfied at equality. Our main strong dualitytheorem, stated below, asserts that if OPE has a Slater point, then strongduality holds and the dual problem attains its optimum.

Theorem 8.1 (Strong Duality Theorem) Consider the problem OPE,where f(x) and gi(x) are convex functions for i ∈ L, gi(x) are linear func-tions for i ∈ E, and X is a convex set. If z∗ is finite and if OPE has aSlater point, then:

(i) D attains its optimum at some u = (uL, uE) .

Page 29: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

29

(ii) u is an optimal solution of D if and only if (−u) ∈ ∂z(0), where z(y)is the perturbation function.

(iii) z∗ = v∗.

(iv) Let u be any dual optimal solution. The set of primal optimal solutions(if any) is characterized as the intersection S1 ∩ S2, where:

S1 = {x ∈ X | gL(x) ≤ 0 , gE(x) = 0 , and uT g(x) = 0} , andS2 = {x ∈ X | L(x, u) = L∗(u)} .

Proof: Define the following set:

S := {(wL, wE , t) | there exists x ∈ X for which gL(x) ≤ wL, gE(x) = wE , and f(x) < z∗ + t} ,

and notice that S is a nonempty convex set. Also notice that (wL, wE , t) =(0, 0, 0) /∈ S, and so there is a hyperplane separating (0, 0, 0) from S. Thismeans that there exists (u, θ) = (uL, uE , θ) = 0 and a scalar α for which

uTLwL + uTEwE + θt ≥ α for all (wL, wE , t) ∈ S ,and

uTL0 + uTE0 + θ · 0 ≤ α .

The second system above implies that α ≥ 0. This together with the defini-tion of S implies that the first system can be rewritten as:

uTL(gL(x)+sL)+uTEgE(x)+θ(f(x)−z∗+δ) ≥ 0 for all x ∈ X, sL ≥ 0, and δ > 0 .

This then implies that uL ≥ and θ ≥ 0. If θ > 0 then we can assume (byre-scaling if necessary) that θ = 1, and then setting sL = 0 we obtain:

uTLgL(x) + uTEgE(x) + f(x) ≥ z∗ − δ for all x ∈ X and δ > 0 .

This in turn implies that L∗(u) ≥ z∗−δ for all δ > 0, and hence L∗(u) ≥ z∗.Furthermore, uL ≥ 0 implies that u is feasible for D, whereby v∗ ≥ L∗(u) ≥z∗. But from weak duality we know that v∗ ≤ z∗, and hence v∗ = L∗(u) = z∗

and u solves D. This proves (i) and (iii) in the case that θ > 0. We nowshow, by contradiction, that indeed θ > 0, which will complete the proofsof (i) and (iii).

Page 30: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

30

Suppose that θ = 0. Then setting sL = 0 we obtain:

uTLgL(x) + uTEgE(x) ≥ 0 for all x ∈ X .

Substituting in the value of the Slater point x = x0 yields:

uTLgL(x0) + uTEgE(x

0) ≥ 0 .

But gE(x0) = 0 and gL(x

0) < 0, which when combined with uL ≥ 0 impliesthat uL = 0. Therefore we have:

uTEgE(x) ≥ 0 for all x ∈ X .

In particular, since x0 ∈ intX, it follows that for some ε > 0 that x0+d ∈ Xwhenever ∥d∥ ≤ ε, whereby

uTEgE(x0 + d) ≥ 0 whenver ∥d∥ ≤ ε .

We can write this out in matrix form as uTE(b−Ax0−Ad) ≥ 0 whenever ∥d∥ ≤ε. However, Ax0 = b which implies that uTEAd ≤ 0 for any d satisfying ∥d∥ ≤ε and hence uTEAd ≤ 0 for any d. Setting d := ATuE yields ∥ATuE∥2 =uTEAd ≤ 0 which implies that ATuE = 0, which implies that that uE = 0 asa consequence of the assumption that A has full row rank. But then we have(u, θ) = (uL, uE , θ) = 0, which contradicts the fact that this triplet must benonzero. This completes the proofs of (i) and (iii).

We now show (ii), starting with (⇒) of (ii). Suppose u is optimal for D.Then from (iii) we have

minx∈X

{f(x) + uT g(x)} = L∗(u) = z∗ = z(0) .

Thus, f(x) + uT g(x) ≥ z(0) for every x ∈ X.

For any given fixed y, if x ∈ X and gL(x) ≤ yL and gE(x) = yE , thenbecause uL ≥ 0 it holds that

f(x) + uT y ≥ f(x) + uT g(x) ≥ z(0) .

Thus for fixed y,

uT y + z(y) = minx f(x) + uT y ≥ z(0) ,s.t. gL(x) ≤ yL

gE(x) = yEx ∈ X ,

Page 31: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

31

that is,z(y) ≥ z(0)− uT (y − 0) ,

and so −u ∈ ∂z(0), which proves (⇒) of (ii).

To show (⇐) of (ii), let −u ∈ ∂z(0). Then

z(y) ≥ z(0)− uT (y − 0) ,

for y ∈ Y . Noting that z(y) is non-increasing in yi for i ∈ L, let ei denotethe ith unit vector for i ∈ L, and observe:

z(0) ≥ z(ei) ≥ z(0)− uT (ei − 0) = z(0)− ui ,

which shows that ui ≥ 0 for i ∈ L, and hence uL ≥ 0. Thus u is feasible forD.

Next, for any x ∈ X, let y = g(x). Then

z(g(x)) = minx f(x) ≤ f(x)s.t. gL(x) ≤ gL(x)

gE(x) = gE(x)x ∈ X

which implies that

f(x) ≥ z(g(x)) ≥ z(0)− uT (g(x)− 0) ,

that is,z(0) ≤ f(x) + uT g(x) .

Thusz∗ = z(0) ≤ min

x∈X{f(x) + uT g(x)} = L∗(u) .

By weak duality, however, z∗ ≥ L∗(u), so z∗ = L∗(u) and so u is optimalfor D. This shows (⇐) of (ii), completing the proof of (ii).

We now show (iv). Suppose x ∈ S1 ∩ S2. Then uL ≥ 0, x satisfiesL∗(u) = L(x, u), and uT g(x) = 0, whereby (x, u) satisfy the conditions ofTheorem 7.1 and hence x is optimal for the primal problem OPE. Conversely,suppose x is optimal for the primal problem. Then z∗ = v∗, and so again

Page 32: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

32

invoking Theorem 7.1 we have gL(x) ≤ 0, gE(x) = 0, L∗(u) = L(x, u), anduT g(x) = 0. Therefore x ∈ S1 ∩ S2.

Let us see how Theorem 8.1 specializes to the case of conic optimization.Recall that the conic optimization primal problem format is:

CP : z∗ = minimumx cTx

s.t. Ax = b

x ∈ K ,

where K is a closed convex cone. In Section 3.4 we derived the conic dualproblem:

CD : v∗ = maximumu,s bTu

s.t. ATu+ s = c

s ∈ K∗ ,

where K∗ is the dual cone of K. Note that CP is a special case of OPEwith a vacuous set of inequality constraints and K as the ground-set, i.e.,X = K. A Slater point for CP then is a point x0 that satisfies:

Ax0 = b and x0 ∈ intK .

Theorem 8.1 then implies:

Corollary 8.2 (Strong Duality for CP) If CP has a Slater point, thenv∗ = z∗ and the conic dual CD attains its optimum.

We can also define a Slater point for CD to be a point (u0, s0) thatsatisfies:

ATu0 + s0 = c and s0 ∈ intK∗ .

By recasting CD in the format of CP, one can prove:

Page 33: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

33

Corollary 2 (Strong Duality for CD) If CD has a Slater point, thenv∗ = z∗ and the primal attains its optimum.

Finally, these two corollaries combine to yield:

Corollary 3 (Strong Duality for CP and CD) If CP and CD each havea Slater point, then v∗ = z∗ and the primal and dual problems attain theiroptima.

8.3 An Example of a (Conic) Convex Problem with FiniteDuality Gap

Consider the problem:

P : z∗ = minimumx1,x2,x3 x1

s.t. −x2 +x3 = 0

−x1 ≤ 10

∥(x1, x2)∥ ≤ x3 ,

where ∥ · ∥ denotes the Euclidean norm. Note that this problem is a conicproblem with K = Q3 = {(x1, x2, x3) : ∥(x1, x2)∥ ≤ x3}, i.e., K is thesecond-order cone in R3, and so can be formatted as:

P : z∗ = minimumx1,x2,x3 x1

s.t. −x2 +x3 = 0

−x1 ≤ 10

(x1, x2, x3) ∈ Q3 .

Note that P is feasible (set x1 = x2 = x3 = 0), and that the first andthird constraints combine to force x1 = 0 in any feasible solution, whereby

Page 34: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

34

z∗ = 0. Also note that any primal feasible solution is of the form x = (0, β, β)for some β ≥ 0.

The ground-set X for this problem is X = K = Q3. We form theLagrangian by using multipliers u1, u2 on the two linear constraints andremembering that X = K = {(x1, x2, x3) | ∥(x1, x2)∥ ≤ x3}, the Lagrangianis:

L(x1, x2, x3,−u1, u2) = x1+u1(−x2+x3)+u2(−x1−10) = −10u2+(1−u2,−u1, u1)T (x1, x2, x3) .

Then

L∗(u1, u2) = min∥(x1,x2)∥≤x3

−10u2 + (1− u2,−u1, u1)T (x1, x2, x3) .

It is straightforward to show that L∗(u1, u2) is given by:

L∗(u1, u2) =

−10u2 if ∥(1− u2,−u1)∥ ≤ u1

−∞ if ∥(1− u2,−u1)∥ > u1 .

We therefore can write the dual problem as:

D : v∗ = maximumu1,u2 −10u2

s.t. ∥(1− u2,−u1)∥ ≤ u1

u1 ∈ R, u2 ≥ 0 .

It is then straightforward to see that the only feasible solution to D willhave u2 = 1, whereby v∗ = −10 < 0 = z∗. Furthermore, the set of optimalsolutions of D is comprised of those vectors (u1, u2) = (α, 1) for all α ≥ 0.Hence both P and D attain their optima and there is a finite duality gapz∗ − v∗ = 0− (−10) = 10.

Because there is a duality gap for this problem, the hypotheses of The-orem 8.1 cannot all hold. Let us examine this further. The interior of theprimal ground-set X = K = Q3 is intK = {(x1, x2, x3) | ∥(x1, x2)∥ < x3}.As discussed above, the only feasible solutions of P are of the form x =(x1, x2, x3) = (0, β, β) for β ≥ 0, and for all such feasible solutions we have

Page 35: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

35

∥(x1, x2)∥ = β = x3, so indeed there is no primal feasible solution that liesin the interior of the ground-set X. Therefore the primal problem has noSlater point, and hence the conclusions of Theorem 8.1 do not apply to thisproblem instance.

We can formulate the perturbed problem and the perturbation functionz(y) := z(y1, y2) as the value of:

P(y1,y2) : z(y1, y2) := minimumx1,x2,x3 x1

s.t. −x2 +x3 = 0 + y1

−x1 ≤ 10 + y2

(x1, x2, x3) ∈ Q3 .

Let us examine the perturbation function z(y) in detail. Note that wheny1 < 0, then any feasible solution of P(y1,y2) has x3 = x2 + y1 < x2 ≤ |x2|,whereby P(y1,y2) is infeasible and so z(y) = +∞. When y1 = 0 and y2 ≥ −10,then x1 = 0 in any feasible solution, and indeed (x1, x2, x3) = (0, 0, 0) isfeasible and hence optimal and z(y) = 0. When y1 = 0 and y2 < −10,then x1 > 0 in any feasible solution, but then x3 > x2 = x3 which isnot possible, hence P(y1,y2) is infeasible and so z(y) = +∞. Last of all,when y1 > 0, note that any feasible solution will have x1 ≥ −10 − y2, but

also (x1, x2, x3) = (−10 − y2,(10+y2)2−y21

2y1,(10+y2)2+y21

2y1) is feasible and hence

optimal and z(y) = −10− y2. Therefore the perturbation function z(y) canbe summarized as follows:

z(y1, y2) =

+∞ if y1 < 0

0 if y1 = 0, y2 ≥ −10

+∞ if y1 = 0, y2 < −10

−10− y2 if y1 > 0 .

Here we see that the domain of z(y) is Y := {y = (y1, y2) : y1 = 0, y2 ≥−10}∪ {y = (y1, y2) : y1 > 0}. The point y = (0, 0) /∈ intY , and indeed z(y)has no subgradient at y = (0, 0).

Page 36: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

36

9 Duality Strategies

9.1 Dualizing “Bad” Constraints

Suppose we wish to solve:

OP : minimumx f(x)

s.t. g(x) ≤ 0

Nx ≤ g .

Suppose that optimization over the constraints “Nx ≤ g” is easy, butthat the addition of the constraints “g(x) ≤ 0” makes the problem muchmore difficult. This can happen, for example, when the constraints “Nx ≤g” are network constraints, and when “g(x) ≤ 0” are non-network con-straints in the model.

LetX = {x | Nx ≤ g}

and re-write OP as:

OP : minimumx f(x)

s.t. g(x) ≤ 0

x ∈ X .

The Lagrangian is:

L(x, u) = f(x) + uT g(x) ,

Page 37: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

37

and the dual function is:

L∗(u) := minimumx f(x) + uT g(x)

s.t. x ∈ X ,

which is

L∗(u) := minimumx f(x) + uT g(x)

s.t. Nx ≤ g .

Notice that L∗(u) is easy to evaluate for any value of u if, say, f(x) and gi(x)are linear or convex and separable, i = 1, . . . ,m, and so in this case we canattempt to solve OP by designing an algorithm to solve the dual problem:

D : maximumu L∗(u)

s.t. u ≥ 0 .

9.2 Dualizing A Large Problem into Many Small Problems

Suppose we wish to solve:

OP : minimumx1,x2 (c1)Tx1 +(c2)Tx2

s.t. B1x1 +B2x2 ≤ d

A1x1 ≤ b1

A2x2 ≤ b2

Page 38: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

38

Notice here that if it were not for the constraints “B1x1 + B2x2 ≤ d”that we would be able to separate the problem into two separate problems.Let us dualize on these constraints. Let:

X ={(x1, x2) | A1x1 ≤ b1, A2x2 ≤ b2

}and re-write OP as:

OP : minimumx1,x2 (c1)Tx1 +(c2)Tx2

s.t. B1x1 +B2x2 ≤ d

(x1, x2) ∈ X

The Lagrangian is:

L(x, u) = (c1)Tx1 + (c2)Tx2 + uT (B1x1 +B2x2 − d)

= −uTd+ ((c1)T + uTB1)x1 + ((c2)T + uTB2)x2 ,

and the dual function is:

L∗(u) = minimumx1,x2 −uTd+ ((c1)T + uTB1)x1 + ((c2)T + uTB2)x2

s.t. (x1, x2) ∈ X

which can be re-written as:

L∗(u) = −uTd

+ minimumA1x1≤b1 ((c1)T + uTB1)x1

+ minimumA2x2≤b2 ((c2)T + uTB2)x2

Page 39: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

39

Notice once again that L∗(u) is easy to evaluate for any value of u, andso we can attempt to solve OP by designing an algorithm to solve the dualproblem:

D : maximumu L∗(u)

s.t. u ≥ 0 .

10 Illustration of Lagrange Duality in Discrete Op-timization

In order to suggest the computational power of duality theory and to illus-trate duality constructs in discrete optimization, let us consider the simpleconstrained shortest path problem portrayed in Figure 3.

2

1

3

4

5

6

1

1 1

1

1

1

2

3

3

Figure 3: A constrained shortest path problem.

The objective in this problem is to find the shortest (least cost) pathfrom node 1 to node 6 subject to the constraint that the chosen path uses

Page 40: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

40

at least four arcs. In an applied context the network might contain millionsof nodes and arcs and billions of paths. For example, in one applicationsetting that can be solved using the ideas considered in this discussion, eacharc would have both a travel cost and a positive utility with, say, the scenicbeauty of the route (which for our example is 1 for each arc). The objectiveis to find the least cost path between a given pair of nodes from all thosepaths whose travel utility is at least 4 units.

Table 1 lists all paths from node 1 to node 6 in the graph pictured inFigure 3. Of the seven different paths from node 1 to node 6, notice thatthere are exactly four different paths that use at least four arcs; these arepath 1-2-3-4-6, path 1-2-3-5-6, path 1-3-5-4-6, and path 1-2-3-5-4-6 . Sincepath 1-3-5-4-6 has the least cost of these four paths (at a cost of 5), thispath is the optimal solution of the constrained shortest path problem.

Path Cost Number of Arcs

1-2-4-6 3 3

1-3-4-6 5 3

1-3-5-6 6 3

1-2-3-4-6 6 4

1-2-3-5-6 7 4

1-3-5-4-6 5 4

1-2-3-5-4-6 6 5

Table 1: Paths from node 1 to node 6.

Now suppose that we have available an efficient algorithm for solving(unconstrained) shortest path problems (indeed, such algorithms are readilyavailable in practice), and that we wish to use this algorithm to solve thegiven problem.

Let us first formulate the constrained shortest path problem as an integerprogram:

Page 41: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

41

CSPP :

z∗ = minxij x12 +x13 +x23 +x24 +3x34 +2x35 +x54 +x46 +3x56

s.t. 4 −x12 −x13 −x23 −x24 −x34 −x35 −x54 −x46 −x56 ≤ 0

x12 +x13 = 1

x12 −x23 −x24 = 0

x13 +x23 −x34 −x35 = 0

x24 +x34 +x54 −x46 = 0

x35 −x54 −x56 = 0

x46 +x56 = 1

xij ∈ {0, 1} for all xij .

In this formulation, xij is a binary variable indicating whether or notarc i − j is chosen as part of the optimal path. The first constraint in thisformulation specifies that the optimal path must contain at least four arcs,which is written as a rearrangement of the expression:

x12 + x13 + x23 + x24 + x34 + x35 + x54 + x46 + x56 ≥ 4 .

The remaining constraints are the familiar constraints describing a shortestpath problem. They indicate that one unit of flow is available at node 1 andmust be sent to node 6; nodes 2, 3, 4, and 5 act as intermediate nodes thatneither generate nor consume flow. Since the decision variables xij mustbe integer, any feasible solution to these constraints traces out a path fromnode 1 to node 6. The objective function determines the cost of any suchpath.

Page 42: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

42

Notice that if we eliminate the first constraint from this problem, thenthe problem becomes an unconstrained shortest path problem that can besolved by our available algorithm. Let us define the ground-set X by allconstraints except the first constraint. Then X is the set of solutions of:

x12 +x13 = 1

x12 −x23 −x24 = 0

x13 +x23 −x34 −x35 = 0

x24 +x34 +x54 −x46 = 0

x35 −x54 −x56 = 0

x46 +x56 = 1

xij ∈ {0, 1} for all xij .

(2)

Then the ground-set X defines the set of paths from node 1 to node 6. Let usthen dualize on the first (the complicating) constraint by moving it to the objectivefunction multiplied by the Lagrange multiplier u. The objective function thenbecomes:

minimumxij

∑i,j

cijxij + u

4−∑i,j

xij

,

and collecting the terms, the modified problem becomes:

Page 43: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

43

L∗(u) = minxij (1− u)x12 + (1− u)x13 + (1− u)x23

+(1− u)x24 + (3− u)x34 + (2− u)x35

+(1− u)x54 + (1− u)x46 + (3− u)x56 + 4u

s.t.

x12 +x13 = 1

x12 −x23 −x24 = 0

x13 +x23 −x34 −x35 = 0

x24 +x34 +x54 −x46 = 0

x35 −x54 −x56 = 0

x46 +x56 = 1

xij ∈ {0, 1} for all xij .

Let L∗(u) denote the optimal objective function value of this problem, for anygiven multiplier u.

Notice that for a fixed value of u, 4u is a constant and the problem becomes ashortest path problem with modified arc costs given by

c′

ij = cij − u .

Note that for a fixed value of u, this problem can be solved by our availablealgorithm for the shortest path problem. Also notice that for any fixed value of u,L∗(u) is a lower bound on the optimal objective value z∗ to the constrained shortestpath problem defined by the original formulation. To obtain the best lower bounds,we need to solve the Lagrangian dual problem:

D : v∗ = maximumu L∗(u)

s.t. u ≥ 0 .

Page 44: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

44

To relate this problem to the earlier material on duality, the ground-set X isthe set of feasible paths, namely the solutions of (2). In this instance, X is a finite(discrete) set. That is, X is the set of paths connecting node 1 to node 6. Supposethat we plot the cost z and the number of arcs r used in each of these paths, seeFigure 4.

7

6

5

4

3

2

1

1 2 3 4 5 6 7

z (c

ost)

r (number of arcs)

conv(S)

z* = 5 v* = 4.5

Figure 4: Resources and costs for the constrained shortest-path problem.

We obtain the seven dots in this figure corresponding to the seven paths listedin Table 1. Now, because our original problem was formulated as a minimizationproblem, the geometric dual is obtained by finding the line that lies on or belowthese dots and that has the largest intercept with the feasibility line. Note fromFigure 4 that the optimal value of the dual is v∗ = 4.5, whereas the optimal valueof the original primal problem is z∗ = 5. In this example, because the cost-resourceset is not convex (it is simply the seven dots), we encounter a duality gap. Ingeneral, integer optimization models will usually have such duality gaps, but theyare typically not very large in practice. Nevertheless, the dual bounding informationprovided by v∗ ≤ z∗ can be used in branch and bound procedures to devise efficientalgorithms for solving integer optimization models.

Let us summarize. The original constrained shortest path problem might bevery difficult to solve directly (at least when there are hundreds of thousands of

Page 45: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

45

nodes and arcs). Duality theory permits us to relax (eliminate) the complicatingconstraint and to solve a much easier Lagrangian shortest path problem with mod-ified arc costs c

ij = cij −u. Solving for any u generates a lower bound L∗(u) on thevalue z∗ to the original problem. Finding the best lower bound maxu≥0 L

∗(u) re-quires that we develop an efficient procedure for solving for the optimal u. Dualitytheory permits us to replace a single and difficult problem (in this instance, a con-strained shortest-path problem) with a sequence of much easier problems (in thisinstance, a number of unconstrained shortest path problems), one for each valueof u encountered in our search procedure. Practice on a wide variety of problemapplications has shown that the dual problem and subsequent branch and boundprocedures can be extremely effective in many applications.

To conclude this discussion, note from Figure 4 that the optimal value for u inthe dual problem is u = 1.5. Also note that if we subtract u = 1.5 from the costof every arc in the network, then the shortest path (with respect to the modifiedcosts) has cost −1.5. Both paths 1-2-4-6 and 1-2-3-5-4-6 are shortest paths for thismodified problem. Adding 4u = 4 × 1.5 = 6 to the modified shortest path costof −1.5 gives a value of 6 − 1.5 = 4.5 which equals v∗. Note that if we permittedthe flow in the original problem to be split among multiple paths (i.e., formulatethe problem as a linear optimization problem instead of as an integer optimizationproblem), then the optimal solution would send one half of the flow on both paths1-2-4-6 and 1-2-3-5-4-6 and incur a cost of:

1

2× 3 +

1

2× 6 = 4.5 .

In general, the value of the dual problem is always identical to the value of the orig-

inal (primal) problem if we permit convex combinations of primal solutions. Thisequivalence between convexification and dualization is easy to see geometrically.

11 Duality Theory Exercises

1. Consider the problem

(P) minx f(x) = −√x

s.t. g(x) = x ≤ 0

x ∈ X = [0,∞) .

Formulate the dual of this problem.

Page 46: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

46

2. Consider the problem(P) minx c

Tx

s.t. b−Ax ≤ 0

x ≥ 0 .

a. Formulate a dual based on g(x) = (g(x), g(x)) = (b − Ax,−x) andX = Rn.

b. How does the dual derived in part (a.) compare with the dual derivedin the notes Section 2.3?

3. Consider the two equivalent problems

(P 1) minx ∥x∥ (P 2) minx12x

Txb−Ax ≤ 0 b−Ax ≤ 0x ∈ Rn x ∈ Rn .

Derive the dual problems (D1) and (D2) of (P 1) and (P 2). What is therelation between (D1) and (D2)?

4. Let P be given as a function of the parameter y:

Py : z(y) = min −x1 + 5x2 − 7x3

s.t. x1 − 3x2 + x3 ≤ y ,x ∈ X = {x ∈ R3 : xj = 0 or 1, j = 1, 2, 3} .

Construct the dual Dy as a function of the parameter y. Graph the functionz(y) as a function of y. For what values of y is there a duality gap?

5. Consider the problem

(P) minx cTx+ 1

2xTQx

s.t. b−Ax ≤ 0 .

where Q is symmetric and positive semidefinite.

a. What is the dual?

b. Show that the dual of the dual is the primal.

Page 47: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

47

6. Consider the program

(P ) minx f(x)

s.t. gi(x) ≤ 0 , i = 1, . . . ,m

x ∈ X ,

where f(·) and gi(·) are convex functions and the set X is a convex set. Con-sider the matrix below:

Property of Dv∗ finite v∗ finite v∗ finite v∗ finite

Property of P attained not attained attained not attained v∗ = +∞ v∗ = −∞v∗ = z∗ v∗ = z∗ v∗ < z∗ v∗ < z∗

z∗ finite 1 5 9 13 17 21Slater pointz∗ finite 2 6 10 14 18 22

no Slater pointz∗ = −∞ 3 7 11 15 19 23

z∗ = +∞ 4 8 12 16 20 24(infeasible)

Prove by citing relevant theorems from the notes, etc., that the followingcases cannot occur: 2, 3, 4, 5, 7, 8, 9, 11, 13, 15, 17, 18, 19, and 21.

7. Let X ⊂ Rn, Y ⊂ Rm, f(x, y) : (X ×Y ) → R. A saddlepoint (x, y) of f(x, y)is a point (x, y) ∈ X × Y such that

f(x, y) ≤ f(x, y) ≤ f(x, y) for all x ∈ X and y ∈ Y .

Prove the equivalence of the following three results:

(a) There exists a saddlepoint (x, y) of f(x, y).

(b) There exists x ∈ X and y ∈ Y for which maxy∈Y f(x, y) = minx∈x f(x, y).

(c) minx∈X{maxy∈Y f(x, y)} = maxy∈Y {minx∈X f(x, y)}.

Page 48: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

48

8. Consider the logarithmic barrier problem:

P (θ) : minimizex cTx− θn∑

j=1

ln(xj)

s.t. Ax = bx > 0 .

Compute a dual of P (θ) by dualizing on the constraints “Ax = b.” What isthe relationship between this dual and the dual of the original linear program(without the logarithmic barrier function)?

9. Consider the program

P (γ) : minimizex −n∑

j=1

ln(xj)

s.t. Ax = bcTx = γx > 0 .

Compute a dual of P (γ) by dualizing on the constraints Ax = b and cTx = γ.What is the relationship between the optimality conditions of P (θ) in theprevious exercise and P (γ)?

Consider the program

P (δ) : minimizex −n∑

j=1

ln(xj)− ln(δ − cTx)

s.t. Ax = bcTx < δx > 0 .

Compute a dual of P (δ) by dualizing on the constraints Ax = b. What is therelationship between the optimality conditions of P (δ) and P (θ) and P (γ) ofthe previous two exercises?

10. Consider the conic form problem CP of convex optimization

CP : z∗ = minimumx cTx

s.t. Ax = b

x ∈ K ,

Page 49: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

49

and the associated conic dual problem:

CD : v∗ = maximumu,s bTu

s.t. ATu+ s = c

s ∈ K∗ ,

where K is a closed convex cone. Show that “the dual of the dual is theprimal,” that is, that the conic dual of CD is CP.

11. Prove Corollary 2, which asserts that the existence of Slater point for theconic dual problem guarantees strong duality and that the primal attains itsoptimum.

12. Consider the following “minimax” problems:

minx∈X

maxy∈Y

ϕ(x, y) and maxy∈Y

minx∈X

ϕ(x, y)

where X and Y are nonempty compact convex sets in Rn and Rm, respec-tively, and ϕ(x, y) is convex in x for fixed y, and is concave in y for fixedx.

(a) Show that minx∈X maxy∈Y ϕ(x, y) ≥ maxy∈Y minx∈X ϕ(x, y) inthe absence of any convexity/concavity assumptions on X, Y , and/orϕ(·, ·).

(b) Show that f(x) := maxy∈Y ϕ(x, y) is a convex function in x and thatg(y) := minx∈X ϕ(x, y) is a concave function in y.

(c) Use a separating hyperplane theorem to prove:

minx∈X

maxy∈Y

ϕ(x, y) = maxy∈Y

minx∈X

ϕ(x, y) .

13. Let X and Y be nonempty sets in Rn, and let f(·), g(·) : Rn → R. Considerthe following conjugate functions f∗(·) and g∗(·) defined as follows:

f∗(u) := minx∈X

{f(x)− utx} ,

andg∗(u) := max

x∈X{g(x)− utx} .

(a) Construct a geometric interpretation of f∗(·) and g∗(·).(b) Show that f∗(·) is a concave function on X∗ := {u | f∗(u) > −∞}, and

g∗(·) is a convex function on Y ∗ := {u | g∗(u) < +∞}.

Page 50: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

50

(c) Prove the following weak duality theorem between the conjugate primalproblem min{f(x)− g(x) | x ∈ X ∩ Y } and the conjugate dual problemmax{f∗(u)− g∗(u) | u ∈ X∗ ∩ Y ∗}:

min{f(x)− g(x) | x ∈ X ∩ Y } ≥ max{f∗(u)− g∗(u) | u ∈ X∗ ∩ Y ∗} .

(d) Now suppose that f(·) is a convex function, g(·) is a concave function,intX ∩ intY = ∅, and min{f(x) − g(x) | x ∈ X ∩ Y } is finite. Showthat equality in part (13c) holds true and that max{f∗(u)−g∗(u) | u ∈X∗ ∩ Y ∗} is attained for some u = u∗.

(e) Consider a standard inequality constrained nonlinear optimization prob-lem using the following notation:

OP : minimumx f(x)

s.t. g1(x) ≤ 0,...

gm(x) ≤ 0,

x ∈ X .

By suitable choices of f(·), g(·), X, and Y , formulate this problem asan instance of the conjugate primal problem min{f(x) − g(x) | x ∈X ∩ Y }. What is the form of the resulting conjugate dual problemmax{f∗(u)− g∗(u) | u ∈ X∗ ∩ Y ∗}?

14. Consider the following problem:

z∗ = min x1 + x2

s.t. x21 + x2

2 = 4−2x1 − x2 ≤ 4 .

(a) Formulate the Lagrange dual of this problem by incorporating bothconstraints into the objective function via multipliers u1, u2.

(b) Compute the gradient of L∗(u) at the point u = (1, 2).

(c) Starting from u = (1, 2), perform one iteration of the steepest ascentmethod for the dual problem. In particular, solve the following problemwhere d = ∇L∗(u):

maxα L∗(u+ αd)s.t. u+ αd ≥ 0

α ≥ 0 .

Page 51: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

51

15. Prove that Remark 1 is true.

16. Consider the following very general conic problem:

GCP : z∗ = minimumx cTx

s.t. Ax− b ∈ K1

x ∈ K2 ,

where K1 ⊂ Rm and K2 ⊂ Rn are each a closed convex cone. Derive thefollowing conic dual for this problem:

GCD : v∗ = maximumy bT y

s.t. c−AT y ∈ K∗2

y ∈ K∗1 ,

and show that the dual of GCD is GCP. How is the conic format of Subection3.4 a special case of GCP?

17. Consider the example of Subsection 8.3 for which there is finite duality gap.

(a) In this example the primal problem is a convex problem. Why is therenevertheless a duality gap? What hypotheses are absent that otherwisewould guarantee strong duality?

(b) The text in Subsection 8.3 reads “It is straightforward to show thatL∗(u1, u2) is given by:

L∗(u1, u2) =

−10u2 if ∥(1− u2, u1)∥ ≤ u1

−∞ if ∥(1− u2, u1)∥ > u1 .

Give a formal demonstration of this “straightforward” statement.

18. For a given scalar γ, the γ-level set Hγ of the dual problem D is given by:

Hγ := {u : u ≥ 0, L∗(u) ≥ γ} .

Show that if OP has a feasible solution x0 satisfying g(x0) < 0, then the levelsets of D must be bounded, i.e., Hγ is a bounded set for all γ.

Page 52: Duality Theory of Constrained Optimizations3.amazonaws.com/mitsloan-php/wp-faculty/sites/30/2016/...a constraint represents the incremental change in the optimal solution value per

52

19. Consider the optimization problem

P : minimize f(x)x

s.t. g(x) ≤ 0

x ∈ X ,

and its Lagrangian relaxation:

L∗(u) = minx∈X

f(x) + uT g(x).

Let v∗ = maxu≥0 L∗(u) denote the optimal value of the Lagrangian dual

problem.

Suppose that the variables x are partitioned as x = (ξ, w) and that f(x), g(x),and X are of the form:

f(x) = f(ξ, w) = cT ξ + d(w)

g(x) = g(ξ, w) = b−Aξ + h(w)

X = {(ξ, w) | ξ ≥ 0}.

That is, ξ models the linear portion of the problem and w together withthe functions d(w) and h(w) model the nonlinear portion of the problem.

Show for this problem that

L∗(u) =

{bTu+minw

{d(w) + uTh(w)

}if ATu ≤ c

−∞ if ATu ≤ c .