L.Vandenberghe ECE236C(Spring2019) …vandenbe/236C/lectures/conj.pdfIndicatorfunctionandnorm...

L. Vandenberghe ECE236C (Spring 2019)

5. Conjugate functions

• closed functions

• conjugate function

• duality

5.1

Closed set

a set C is closed if it contains its boundary:

xk ∈ C, xk → x =⇒ x ∈ C

Operations that preserve closedness

• the intersection of (finitely or infinitely many) closed sets is closed

• the union of a finite number of closed sets is closed

• inverse under linear mapping: {x | Ax ∈ C} is closed if C is closed

Conjugate functions 5.2

Image under linear mapping

the image of a closed set under a linear mapping is not necessarily closed

Example

C = {(x1, x2) ∈ R2+ | x1x2 ≥ 1}, A =

[1 0

], AC = R++

Sufficient condition: AC is closed if

• C is closed and convex

• and C does not have a recession direction in the nullspace of A, i.e.,

Ay = 0, x ∈ C, x + αy ∈ C for all α ≥ 0 =⇒ y = 0

in particular, this holds for any matrix A if C is bounded


Closed function

Definition: a function is closed if its epigraph is a closed set

Examples

• f (x) = − log(1 − x2) with dom f = {x | |x | < 1}• f (x) = x log x with dom f = R+ and f (0) = 0

• indicator function of a closed set C:

δC(x) ={

0 x ∈ C+∞ otherwise

Not closed

• f (x) = x log x with dom f = R++, or with dom f = R+ and f (0) = 1

• indicator function of a set C if C is not closed


Properties

Sublevel sets: f is closed if and only if all its sublevel sets are closed

Minimum: if f is closed with bounded sublevel sets then it has a minimizer

Common operations on convex functions that preserve closedness

• sum: f = f1 + f2 is closed if f1 and f2 are closed

• composition with affine mapping: f = g(Ax + b) is closed if g is closed

• supremum: f (x) = supα fα(x) is closed if each function fα is closed

in each case, we assume dom f , ∅


Outline



• duality

Conjugate function

the conjugate of a function f is

f ∗(y) = supx∈dom f

(yT x − f (x))

f ∗ is closed and convex (even when f is not)

Fenchel’s inequality: the definition implies that

f (x) + f ∗(y) ≥ xT y for all x, y

this is an extension to non-quadratic convex f of the inequality

12

xT x +12yT y ≥ xT y


Quadratic function

f (x) = 12

xT Ax + bT x + c

Strictly convex case (A � 0)

f ∗(y) = 12(y − b)T A−1(y − b) − c

General convex case (A � 0)

f ∗(y) = 12(y − b)T A†(y − b) − c, dom f ∗ = range(A) + b


Negative entropy and negative logarithm

Negative entropy

f (x) =n∑

i=1xi log xi f ∗(y) =

n∑i=1

eyi−1

Negative logarithm

f (x) = −n∑

i=1log xi f ∗(y) = −

n∑i=1

log(−yi) − n

Matrix logarithm

f (X) = − log det X (dom f = Sn++) f ∗(Y ) = − log det(−Y ) − n


Indicator function and norm

Indicator of convex set C: conjugate is the support function of C

δC(x) ={

0 x ∈ C+∞ x < C δ∗C(y) = sup

x∈CyT x

Indicator of convex cone C: conjugate is indicator of polar (negative dual) cone

δ∗C(y) = δ−C∗(y) = δC∗(−y) ={

0 yT x ≤ 0 ∀x ∈ C+∞ otherwise

Norm: conjugate is indicator of unit ball for dual norm

f (x) = ‖x‖ f ∗(y) ={

0 ‖y‖∗ ≤ 1+∞ ‖y‖∗ > 1

(see next page)Conjugate functions 5.9

Proof: recall the definition of dual norm

‖y‖∗ = sup‖x‖≤1

xT y

to evaluate f ∗(y) = supx (yT x − ‖x‖) we distinguish two cases

• if ‖y‖∗ ≤ 1, then (by definition of dual norm)

yT x ≤ ‖x‖ for all x

and equality holds if x = 0; therefore supx (yT x − ‖x‖) = 0

• if ‖y‖∗ > 1, there exists an x with ‖x‖ ≤ 1, xT y > 1; then

f ∗(y) ≥ yT(t x) − ‖t x‖ = t(yT x − ‖x‖)

and right-hand side goes to infinity if t →∞


Calculus rules

Separable sum

f (x1, x2) = g(x1) + h(x2) f ∗(y1, y2) = g∗(y1) + h∗(y2)

Scalar multiplication (α > 0)

f (x) = αg(x) f ∗(y) = αg∗(y/α)

f (x) = αg(x/α) f ∗(y) = αg∗(y)

• the operation f (x) = αg(x/α) is sometimes called “right scalar multiplication”

• a convenient notation is f = gα for the function (gα)(x) = αg(x/α)• conjugates can be written concisely as (gα)∗ = αg∗ and (αg)∗ = g∗α


Calculus rules

Addition to affine function

f (x) = g(x) + aT x + b f ∗(y) = g∗(y − a) − b

Translation of argument

f (x) = g(x − b) f ∗(y) = bT y + g∗(y)

Composition with invertible linear mapping (A square and nonsingular)

f (x) = g(Ax) f ∗(y) = g∗(A−T y)

Infimal convolution

f (x) = infu+v=x

(g(u) + h(v)) f ∗(y) = g∗(y) + h∗(y)


The second conjugate

f ∗∗(x) = supy∈dom f ∗

(xT y − f ∗(y))

• f ∗∗ is closed and convex

• from Fenchel’s inequality, xT y − f ∗(y) ≤ f (x) for all y and x; therefore

f ∗∗(x) ≤ f (x) for all x

equivalently, epi f ⊆ epi f ∗∗ (for any f )

• if f is closed and convex, then

f ∗∗(x) = f (x) for all x

equivalently, epi f = epi f ∗∗ (if f is closed and convex); proof on next page


Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ , epi f

suppose (x, f ∗∗(x)) < epi f ; then there is a strict separating hyperplane:[ab

]T [z − x

s − f ∗∗(x)]≤ c < 0 for all (z, s) ∈ epi f

for some a, b, c with b ≤ 0 (b > 0 gives a contradiction as s→∞)

• if b < 0, define y = a/(−b) and maximize left-hand side over (z, s) ∈ epi f :

f ∗(y) − yT x + f ∗∗(x) ≤ c/(−b) < 0

this contradicts Fenchel’s inequality

• if b = 0, choose y ∈ dom f ∗ and add small multiple of (y,−1) to (a, b):[a + ε y−ε

]T [z − x

s − f ∗∗(x)]≤ c + ε

(f ∗(y) − xT y + f ∗∗(x)

)< 0

now apply the argument for b < 0


Conjugates and subgradients

if f is closed and convex, then

y ∈ ∂ f (x) ⇐⇒ x ∈ ∂ f ∗(y) ⇐⇒ xT y = f (x) + f ∗(y)

Proof. if y ∈ ∂ f (x), then f ∗(y) = supu (yTu − f (u)) = yT x − f (x); hence

f ∗(v) = supu(vTu − f (u))

≥ vT x − f (x)= xT(v − y) − f (x) + yT x

= f ∗(y) + xT(v − y)

this holds for all v; therefore, x ∈ ∂ f ∗(y)reverse implication x ∈ ∂ f ∗(y) =⇒ y ∈ ∂ f (x) follows from f ∗∗ = f


Example

−1 1x

f (x) = δ[−1,1](x)

y

f ∗(y) = |y |

1−1 x

∂ f (x)

1

−1

y

∂ f ∗(y)


Strongly convex function

Definition (page 1.18) f is µ-strongly convex (for ‖ · ‖) if dom f is convex and

f (θx + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) − µ2θ(1 − θ)‖x − y‖2

for all x, y ∈ dom f and θ ∈ [0,1]

First-order condition

• if f is µ-strongly convex, then

f (y) ≥ f (x) + gT(y − x) + µ2‖y − x‖2 for all x, y ∈ dom f , g ∈ ∂ f (x)

• for differentiable f this is the inequality (4) on page 1.19


Proof

• recall the definition of directional derivative (page 2.28 and 2.29):

f ′(x, y − x) = infθ>0

f (x + θ(y − x)) − f (x)θ

and the infimum is approached as θ → 0

• if f is µ-strongly convex and subdifferentiable at x, then for all y ∈ dom f ,

f ′(x, y − x) ≤ infθ∈(0,1]

(1 − θ) f (x) + θ f (y) − (µ/2)θ(1 − θ)‖y − x‖2 − f (x)θ

= f (y) − f (x) − µ2‖y − x‖2

• from page 2.31, the directional derivative is the support function of ∂ f (x):

gT(y − x) ≤ supg∈∂ f (x)

gT(y − x)

= f ′(x; y − x)≤ f (y) − f (x) − µ

2‖y − x‖2


Conjugate of strongly convex function

assume f is closed and strongly convex with parameter µ > 0 for the norm ‖ · ‖

• f ∗ is defined for all y (i.e., dom f ∗ = Rn)

• f ∗ is differentiable everywhere, with gradient

∇ f ∗(y) = argmaxx(yT x − f (x))

• ∇ f ∗ is Lipschitz continuous with constant 1/µ for the dual norm ‖ · ‖∗:

‖∇ f ∗(y) − ∇ f ∗(y′)‖ ≤ 1µ‖y − y′‖∗ for all y and y′


Proof: if f is strongly convex and closed

• yT x − f (x) has a unique maximizer x for every y

• x maximizes yT x − f (x) if and only if y ∈ ∂ f (x); from page 5.15

y ∈ ∂ f (x) ⇐⇒ x ∈ ∂ f ∗(y) = {∇ f ∗(y)}

hence ∇ f ∗(y) = argmaxx (yT x − f (x))• from first-order condition on page 5.17: if y ∈ ∂ f (x), y′ ∈ ∂ f (x′):

f (x′) ≥ f (x) + yT(x′ − x) + µ2‖x′ − x‖2

f (x) ≥ f (x′) + (y′)T(x − x′) + µ2‖x′ − x‖2

combining these inequalities shows

µ‖x − x′‖2 ≤ (y − y′)T(x − x′) ≤ ‖y − y′‖∗‖x − x′‖

• now substitute x = ∇ f ∗(y) and x′ = ∇ f ∗(y′)


Outline



• duality

Duality

primal: minimize f (x) + g(Ax)dual: maximize −g∗(z) − f ∗(−AT z)

• follows from Lagrange duality applied to reformulated primal

minimize f (x) + g(y)subject to Ax = y

dual function for the formulated problem is:

infx,y( f (x) + zT Ax + g(y) − zT y) = − f ∗(−AT z) − g∗(z)

• Slater’s condition (for convex f , g): strong duality holds if there exists an x with

x ∈ int dom f , Ax ∈ int dom g

this also guarantees that the dual optimum is attained, if optimal value is finite


Set constraint

minimize f (x)subject to Ax − b ∈ C

Primal and dual problem

primal: minimize f (x) + δC(Ax − b)dual: maximize −bT z − δ∗C(z) − f ∗(−AT z)

Examples

constraint set C support function δ∗C(z)equality Ax = b {0} 0

norm inequality ‖Ax − b‖ ≤ 1 unit ‖ · ‖-ball ‖z‖∗conic inequality Ax �K b −K δK∗(z)


Norm regularization

minimize f (x) + ‖Ax − b‖

• take g(y) = ‖y − b‖ in general problem

minimize f (x) + g(Ax)

• conjugate of ‖ · ‖ is indicator of unit ball for dual norm

g∗(z) = bT z + δB(z) where B = {z | ‖z‖∗ ≤ 1}

• hence, dual problem can be written as

maximize −bT z − f ∗(−AT z)subject to ‖z‖∗ ≤ 1


Optimality conditions

minimize f (x) + g(y)subject to Ax = y

assume f , g are convex and Slater’s condition holds

Optimality conditions: x is optimal if and only if there exists a z such that

1. primal feasibility: x ∈ dom f and y = Ax ∈ dom g

2. x and y = Ax are minimizers of the Lagrangian f (x) + zT Ax + g(y) − zT y:

−AT z ∈ ∂ f (x), z ∈ ∂g(Ax)

if g is closed, this can be written symmetrically as

−AT z ∈ ∂ f (x), Ax ∈ ∂g∗(z)


References

• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and MinimizationAlgoritms (1993), chapter X.

• D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization(2003), chapter 7.

• R. T. Rockafellar, Convex Analysis (1970).


L.Vandenberghe ECE236C(Spring2019) …vandenbe/236C/lectures/conj.pdfIndicatorfunctionandnorm...

Documents

Transcript of L.Vandenberghe ECE236C(Spring2019) …vandenbe/236C/lectures/conj.pdfIndicatorfunctionandnorm...