L.Vandenberghe ECE236C(Spring2019) …vandenbe/236C/lectures/conj.pdfIndicatorfunctionandnorm...
Transcript of L.Vandenberghe ECE236C(Spring2019) …vandenbe/236C/lectures/conj.pdfIndicatorfunctionandnorm...
L. Vandenberghe ECE236C (Spring 2019)
5. Conjugate functions
• closed functions
• conjugate function
• duality
5.1
Closed set
a set C is closed if it contains its boundary:
xk ∈ C, xk → x =⇒ x ∈ C
Operations that preserve closedness
• the intersection of (finitely or infinitely many) closed sets is closed
• the union of a finite number of closed sets is closed
• inverse under linear mapping: {x | Ax ∈ C} is closed if C is closed
Conjugate functions 5.2
Image under linear mapping
the image of a closed set under a linear mapping is not necessarily closed
Example
C = {(x1, x2) ∈ R2+ | x1x2 ≥ 1}, A =
[1 0
], AC = R++
Sufficient condition: AC is closed if
• C is closed and convex
• and C does not have a recession direction in the nullspace of A, i.e.,
Ay = 0, x ∈ C, x + αy ∈ C for all α ≥ 0 =⇒ y = 0
in particular, this holds for any matrix A if C is bounded
Conjugate functions 5.3
Closed function
Definition: a function is closed if its epigraph is a closed set
Examples
• f (x) = − log(1 − x2) with dom f = {x | |x | < 1}• f (x) = x log x with dom f = R+ and f (0) = 0
• indicator function of a closed set C:
δC(x) ={
0 x ∈ C+∞ otherwise
Not closed
• f (x) = x log x with dom f = R++, or with dom f = R+ and f (0) = 1
• indicator function of a set C if C is not closed
Conjugate functions 5.4
Properties
Sublevel sets: f is closed if and only if all its sublevel sets are closed
Minimum: if f is closed with bounded sublevel sets then it has a minimizer
Common operations on convex functions that preserve closedness
• sum: f = f1 + f2 is closed if f1 and f2 are closed
• composition with affine mapping: f = g(Ax + b) is closed if g is closed
• supremum: f (x) = supα fα(x) is closed if each function fα is closed
in each case, we assume dom f , ∅
Conjugate functions 5.5
Outline
• closed functions
• conjugate function
• duality
Conjugate function
the conjugate of a function f is
f ∗(y) = supx∈dom f
(yT x − f (x))
f ∗ is closed and convex (even when f is not)
Fenchel’s inequality: the definition implies that
f (x) + f ∗(y) ≥ xT y for all x, y
this is an extension to non-quadratic convex f of the inequality
12
xT x +12yT y ≥ xT y
Conjugate functions 5.6
Quadratic function
f (x) = 12
xT Ax + bT x + c
Strictly convex case (A � 0)
f ∗(y) = 12(y − b)T A−1(y − b) − c
General convex case (A � 0)
f ∗(y) = 12(y − b)T A†(y − b) − c, dom f ∗ = range(A) + b
Conjugate functions 5.7
Negative entropy and negative logarithm
Negative entropy
f (x) =n∑
i=1xi log xi f ∗(y) =
n∑i=1
eyi−1
Negative logarithm
f (x) = −n∑
i=1log xi f ∗(y) = −
n∑i=1
log(−yi) − n
Matrix logarithm
f (X) = − log det X (dom f = Sn++) f ∗(Y ) = − log det(−Y ) − n
Conjugate functions 5.8
Indicator function and norm
Indicator of convex set C: conjugate is the support function of C
δC(x) ={
0 x ∈ C+∞ x < C δ∗C(y) = sup
x∈CyT x
Indicator of convex cone C: conjugate is indicator of polar (negative dual) cone
δ∗C(y) = δ−C∗(y) = δC∗(−y) ={
0 yT x ≤ 0 ∀x ∈ C+∞ otherwise
Norm: conjugate is indicator of unit ball for dual norm
f (x) = ‖x‖ f ∗(y) ={
0 ‖y‖∗ ≤ 1+∞ ‖y‖∗ > 1
(see next page)Conjugate functions 5.9
Proof: recall the definition of dual norm
‖y‖∗ = sup‖x‖≤1
xT y
to evaluate f ∗(y) = supx (yT x − ‖x‖) we distinguish two cases
• if ‖y‖∗ ≤ 1, then (by definition of dual norm)
yT x ≤ ‖x‖ for all x
and equality holds if x = 0; therefore supx (yT x − ‖x‖) = 0
• if ‖y‖∗ > 1, there exists an x with ‖x‖ ≤ 1, xT y > 1; then
f ∗(y) ≥ yT(t x) − ‖t x‖ = t(yT x − ‖x‖)
and right-hand side goes to infinity if t →∞
Conjugate functions 5.10
Calculus rules
Separable sum
f (x1, x2) = g(x1) + h(x2) f ∗(y1, y2) = g∗(y1) + h∗(y2)
Scalar multiplication (α > 0)
f (x) = αg(x) f ∗(y) = αg∗(y/α)
f (x) = αg(x/α) f ∗(y) = αg∗(y)
• the operation f (x) = αg(x/α) is sometimes called “right scalar multiplication”
• a convenient notation is f = gα for the function (gα)(x) = αg(x/α)• conjugates can be written concisely as (gα)∗ = αg∗ and (αg)∗ = g∗α
Conjugate functions 5.11
Calculus rules
Addition to affine function
f (x) = g(x) + aT x + b f ∗(y) = g∗(y − a) − b
Translation of argument
f (x) = g(x − b) f ∗(y) = bT y + g∗(y)
Composition with invertible linear mapping (A square and nonsingular)
f (x) = g(Ax) f ∗(y) = g∗(A−T y)
Infimal convolution
f (x) = infu+v=x
(g(u) + h(v)) f ∗(y) = g∗(y) + h∗(y)
Conjugate functions 5.12
The second conjugate
f ∗∗(x) = supy∈dom f ∗
(xT y − f ∗(y))
• f ∗∗ is closed and convex
• from Fenchel’s inequality, xT y − f ∗(y) ≤ f (x) for all y and x; therefore
f ∗∗(x) ≤ f (x) for all x
equivalently, epi f ⊆ epi f ∗∗ (for any f )
• if f is closed and convex, then
f ∗∗(x) = f (x) for all x
equivalently, epi f = epi f ∗∗ (if f is closed and convex); proof on next page
Conjugate functions 5.13
Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ , epi f
suppose (x, f ∗∗(x)) < epi f ; then there is a strict separating hyperplane:[ab
]T [z − x
s − f ∗∗(x)]≤ c < 0 for all (z, s) ∈ epi f
for some a, b, c with b ≤ 0 (b > 0 gives a contradiction as s→∞)
• if b < 0, define y = a/(−b) and maximize left-hand side over (z, s) ∈ epi f :
f ∗(y) − yT x + f ∗∗(x) ≤ c/(−b) < 0
this contradicts Fenchel’s inequality
• if b = 0, choose y ∈ dom f ∗ and add small multiple of (y,−1) to (a, b):[a + ε y−ε
]T [z − x
s − f ∗∗(x)]≤ c + ε
(f ∗(y) − xT y + f ∗∗(x)
)< 0
now apply the argument for b < 0
Conjugate functions 5.14
Conjugates and subgradients
if f is closed and convex, then
y ∈ ∂ f (x) ⇐⇒ x ∈ ∂ f ∗(y) ⇐⇒ xT y = f (x) + f ∗(y)
Proof. if y ∈ ∂ f (x), then f ∗(y) = supu (yTu − f (u)) = yT x − f (x); hence
f ∗(v) = supu(vTu − f (u))
≥ vT x − f (x)= xT(v − y) − f (x) + yT x
= f ∗(y) + xT(v − y)
this holds for all v; therefore, x ∈ ∂ f ∗(y)reverse implication x ∈ ∂ f ∗(y) =⇒ y ∈ ∂ f (x) follows from f ∗∗ = f
Conjugate functions 5.15
Example
−1 1x
f (x) = δ[−1,1](x)
y
f ∗(y) = |y |
1−1 x
∂ f (x)
1
−1
y
∂ f ∗(y)
Conjugate functions 5.16
Strongly convex function
Definition (page 1.18) f is µ-strongly convex (for ‖ · ‖) if dom f is convex and
f (θx + (1 − θ)y) ≤ θ f (x) + (1 − θ) f (y) − µ2θ(1 − θ)‖x − y‖2
for all x, y ∈ dom f and θ ∈ [0,1]
First-order condition
• if f is µ-strongly convex, then
f (y) ≥ f (x) + gT(y − x) + µ2‖y − x‖2 for all x, y ∈ dom f , g ∈ ∂ f (x)
• for differentiable f this is the inequality (4) on page 1.19
Conjugate functions 5.17
Proof
• recall the definition of directional derivative (page 2.28 and 2.29):
f ′(x, y − x) = infθ>0
f (x + θ(y − x)) − f (x)θ
and the infimum is approached as θ → 0
• if f is µ-strongly convex and subdifferentiable at x, then for all y ∈ dom f ,
f ′(x, y − x) ≤ infθ∈(0,1]
(1 − θ) f (x) + θ f (y) − (µ/2)θ(1 − θ)‖y − x‖2 − f (x)θ
= f (y) − f (x) − µ2‖y − x‖2
• from page 2.31, the directional derivative is the support function of ∂ f (x):
gT(y − x) ≤ supg∈∂ f (x)
gT(y − x)
= f ′(x; y − x)≤ f (y) − f (x) − µ
2‖y − x‖2
Conjugate functions 5.18
Conjugate of strongly convex function
assume f is closed and strongly convex with parameter µ > 0 for the norm ‖ · ‖
• f ∗ is defined for all y (i.e., dom f ∗ = Rn)
• f ∗ is differentiable everywhere, with gradient
∇ f ∗(y) = argmaxx(yT x − f (x))
• ∇ f ∗ is Lipschitz continuous with constant 1/µ for the dual norm ‖ · ‖∗:
‖∇ f ∗(y) − ∇ f ∗(y′)‖ ≤ 1µ‖y − y′‖∗ for all y and y′
Conjugate functions 5.19
Proof: if f is strongly convex and closed
• yT x − f (x) has a unique maximizer x for every y
• x maximizes yT x − f (x) if and only if y ∈ ∂ f (x); from page 5.15
y ∈ ∂ f (x) ⇐⇒ x ∈ ∂ f ∗(y) = {∇ f ∗(y)}
hence ∇ f ∗(y) = argmaxx (yT x − f (x))• from first-order condition on page 5.17: if y ∈ ∂ f (x), y′ ∈ ∂ f (x′):
f (x′) ≥ f (x) + yT(x′ − x) + µ2‖x′ − x‖2
f (x) ≥ f (x′) + (y′)T(x − x′) + µ2‖x′ − x‖2
combining these inequalities shows
µ‖x − x′‖2 ≤ (y − y′)T(x − x′) ≤ ‖y − y′‖∗‖x − x′‖
• now substitute x = ∇ f ∗(y) and x′ = ∇ f ∗(y′)
Conjugate functions 5.20
Outline
• closed functions
• conjugate function
• duality
Duality
primal: minimize f (x) + g(Ax)dual: maximize −g∗(z) − f ∗(−AT z)
• follows from Lagrange duality applied to reformulated primal
minimize f (x) + g(y)subject to Ax = y
dual function for the formulated problem is:
infx,y( f (x) + zT Ax + g(y) − zT y) = − f ∗(−AT z) − g∗(z)
• Slater’s condition (for convex f , g): strong duality holds if there exists an x with
x ∈ int dom f , Ax ∈ int dom g
this also guarantees that the dual optimum is attained, if optimal value is finite
Conjugate functions 5.21
Set constraint
minimize f (x)subject to Ax − b ∈ C
Primal and dual problem
primal: minimize f (x) + δC(Ax − b)dual: maximize −bT z − δ∗C(z) − f ∗(−AT z)
Examples
constraint set C support function δ∗C(z)equality Ax = b {0} 0
norm inequality ‖Ax − b‖ ≤ 1 unit ‖ · ‖-ball ‖z‖∗conic inequality Ax �K b −K δK∗(z)
Conjugate functions 5.22
Norm regularization
minimize f (x) + ‖Ax − b‖
• take g(y) = ‖y − b‖ in general problem
minimize f (x) + g(Ax)
• conjugate of ‖ · ‖ is indicator of unit ball for dual norm
g∗(z) = bT z + δB(z) where B = {z | ‖z‖∗ ≤ 1}
• hence, dual problem can be written as
maximize −bT z − f ∗(−AT z)subject to ‖z‖∗ ≤ 1
Conjugate functions 5.23
Optimality conditions
minimize f (x) + g(y)subject to Ax = y
assume f , g are convex and Slater’s condition holds
Optimality conditions: x is optimal if and only if there exists a z such that
1. primal feasibility: x ∈ dom f and y = Ax ∈ dom g
2. x and y = Ax are minimizers of the Lagrangian f (x) + zT Ax + g(y) − zT y:
−AT z ∈ ∂ f (x), z ∈ ∂g(Ax)
if g is closed, this can be written symmetrically as
−AT z ∈ ∂ f (x), Ax ∈ ∂g∗(z)
Conjugate functions 5.24
References
• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and MinimizationAlgoritms (1993), chapter X.
• D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization(2003), chapter 7.
• R. T. Rockafellar, Convex Analysis (1970).
Conjugate functions 5.25