Post on 01-Jul-2021
DC Programming:A brief tutorial.
Andres Munoz MedinaCourant Institute of Mathematical Sciences
munoz@cims.nyu.edu
Tuesday, November 12, 2013
Difference of Convex Functions
• Definition: A function is said to be DC if there exists convex functions such that
• Definition: A function is locally DC if for every there exists a neighborhood such that is DC.
fg, h f = g − h
xf |UU
Tuesday, November 12, 2013
Contents
• Motivation.
• DC functions and properties.
• DC programming and DCA.
• CCCP and convergence results.
• Global optimization algorithms.
Tuesday, November 12, 2013
Motivation
• Not all machine learning problems are convex anymore
• Transductive SVM’s [Wang et. al 2003]
• Kernel learning [Argyriou et. al 2004]
• Structure prediction [Narasinham 2012]
• Auction mechanism design [MM and Muñoz]
Tuesday, November 12, 2013
Notation
• Let be a convex function. The conjugate of is defined as
• For , denotes the subdifferential of at , i.e.
• will denote the exact subdifferential.
gg
g
g∗
∂�g(x0)� > 0
∂�g(x0) = {v ∈ Rn|g(x) ≥ g(x0) + �x− x0, v� − �
g∗(y) = supx�y, x� − g(x)
�x0
∂g(x0)
Tuesday, November 12, 2013
DC functions • A function is DC iff is the
integral of a function of bounded variation. [Hartman 59]
• A locally DC function is globally DC. [Hartman 59]
• All twice continuously differentiable functions are DC.
• Closed under sum, negation, supremum and products.
f : R → R
Tuesday, November 12, 2013
DC programming
• DC programming refers to optimization problems of the form.
• More generally, for DC functions
minx
g(x)− h(x)
minx
g(x)− h(x)
subject to fi(x) ≤ 0
fi(x)
Tuesday, November 12, 2013
Global optimality conditions.
• A point is a global solution if and only if
• Let , then a point is a global solution if and only if
x∗
x∗w∗ = infx
g(x)− h(x)
0 = infx{−h(x) + t|g(x)− t ≤ w∗}
∂�h(x∗) ⊂ ∂�g(x
∗)
Tuesday, November 12, 2013
Global optimality conditions.
• A point is a global solution if and only if
• Let , then a point is a global solution if and only if
x∗
x∗w∗ = infx
g(x)− h(x)
0 = infx{−h(x) + t|g(x)− t ≤ w∗}
∂�h(x∗) ⊂ ∂�g(x
∗)
Tuesday, November 12, 2013
Global optimality conditions.
• A point is a global solution if and only if
• Let , then a point is a global solution if and only if
x∗
x∗w∗ = infx
g(x)− h(x)
0 = infx{−h(x) + t|g(x)− t ≤ w∗}
∂�h(x∗) ⊂ ∂�g(x
∗)
Tuesday, November 12, 2013
Local optimality conditions
• If verifies , then is a strict local minimizer of x∗ ∂h(x∗) ⊂ int∂g(x∗)
g − h
Tuesday, November 12, 2013
DC duality
• By definition of conjugate function
• This is the dual of the original problem
infx
g(x)− h(x) = infx
g(x)− (supy�x, y� − h∗(y))
= infx
infyh∗(y) + g(x)− �x, y�
= infyh∗(y)− g∗(y)
Tuesday, November 12, 2013
DC algorithm.
• We want to find a sequence that decreases the function at every step.
• Use duality. If then
h∗(y)− g∗(y) = �x0, y� − h(x0)− supx(�x, y� − g(x))
= infx
g(x)− h(x) + �x0 − x, y�
≤ g(x0)− h(x0)
y ∈ ∂h(x0)
xk
Tuesday, November 12, 2013
DC algorithm
• Solve the partial problems
S(x∗) inf{h∗(y)− g∗(y) : y ∈ ∂h(x∗)}T (y∗) inf{g(x)− h(x) : x ∈ ∂g∗(y∗)}
• Choose and .• Solve concave minimization problems.• Simplified DCA
yk ∈ S(xk) xk+1 ∈ T (yk)
yk ∈ ∂h(xk) xk ∈ ∂g∗(yk)
Tuesday, November 12, 2013
DCA as CCCP
• If the function is differentiable, the simplified DCA becomes and
• Equivalent to
yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)
Tuesday, November 12, 2013
DCA as CCCP
• If the function is differentiable, the simplified DCA becomes and
• Equivalent to
yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)
Tuesday, November 12, 2013
DCA as CCCP
• If the function is differentiable, the simplified DCA becomes and
• Equivalent to
yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)
xk+1 ∈ argmin g(x)− �x,∇h(xk)�
Tuesday, November 12, 2013
DCA as CCCP
• If the function is differentiable, the simplified DCA becomes and
• Equivalent to
yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)
xk+1 ∈ argmin g(x)− �x,∇h(xk)�
xk+1 ∈ argmin g(x)− h(xk)− �x− xk,∇h(xk)�
Tuesday, November 12, 2013
CCCP as a majorization minimization algorithm
• To minimize , MM algorithms build a majorization function such that
• Do coordinate descent on
• In our scenario
f(x) ≤ F (x, y) ∀x, yf(x) = F (x, x) ∀x
F
Ff
F (x, y) = g(x)− h(y)− �x− y,∇h(y)�
Tuesday, November 12, 2013
Convergence results
• Unconstrained DC functions: Convergence to a local minimum (no rate of convergence). Bound depends on moduli of convexity. [PD Tao, LT Hoai An 97]
• Unconstrained smooth optimization: Linear or almost quadratic convergence depending on curvature [Roweis et. al 03]
• Constrained smooth optimization: Convergence without rate using Zangwill’s theory. [Lanckriet, Sriperumbudur 09]
Tuesday, November 12, 2013
Global convergence
• NP-hard in general: Minimizing a quadratic function with one negative eigenvalue with linear constraints. [Pardalos 91]
• Mostly branch and bound methods and cutting plane methods [H. Tuy 03, Horst and Thoai 99]
• Some successful results with low rank functions.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x0, t0)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x0, t0)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x0, t0)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x1, t1)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x1, t1)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x1, t1)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x2, t2)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x2, t2)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x2, t2)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x3, t3)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x3, t3)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Cutting plane algorithm[H. Tuy 03]
(x3, t3)
•All limit points of this algorithm are global minimizers• Finite convergence for
piecewise linear functions (Conjecture).•Keeps track of exponentially
many vertices.
Tuesday, November 12, 2013
Branch and bound.[Hoarst and Thoai 99]
• Main idea is to keep upper and lower bounds of on simplices .
• Upper bound: Evaluate for . Use DCA as subroutine for better bounds.
• Lower bound: If are the vertices of and . Solve
Sk
g(xk)− h(xk)xk ∈ Sk
g − h
(vi)n+1i=1 Sk
x =n+1�
i=1
αivi
minα
g��
i
αivi�−
�αih(vi)
Tuesday, November 12, 2013
Low rank optimization and polynomial guarantees[Goyal and Ravi 08]
• A function has rank if there exists and such that
• Most examples in Economy literature.
• For a quasi-concave function we want to solve .
• Can always transform DC programs to this type of problem.
f : Rn → R k � ng : Rk → R α1, . . . ,αk ∈ Rn
f(x) = g(α1 · x, . . . ,αk · x)
minx∈C
f(x)f
Tuesday, November 12, 2013
Algorithm
• Let satisfy the following conditions.
‣ The gradient
‣ for all and some
‣ for all
• There is an algorithm that finds with . in
∇g(y) ≥ 0
g(λy) ≤ λcg(y) λ > 1 c
αi · x > 0 x ∈ P
�xf(�x) ≤ (1 + �)f(x∗) O
�ck
�k
�
g
Tuesday, November 12, 2013
Further reading
• Farkas type results and duality for DC programs with convex constraints. [Dinh et. al 13]
• On DC functions and mappings [Duda et. al 01]
Tuesday, November 12, 2013
Open problems
• Local rate of convergence for constrained DC programs.
• Is there a condition under which DCA finds global optima. For instance might not be convex but might.
• Finite convergence of cutting plane methods.
g − hh∗ − g∗
Tuesday, November 12, 2013
References‣ Dinh, Nguyen, Guy Vallet, and T. T. A. Nghia. "Farkas-type results and
duality for DC programs with convex constraints." (2006).
‣ Goyal, Vineet, and R. Ravi. "An FPTAS for minimizing a class of low-rank quasi-concave functions over a convex set." Operations Research Letters 41.2 (2013): 191-196.
‣ Tao, Pham Dinh, and Le Thi Hoai An. "Convex analysis approach to DC programming: theory, algorithms and applications." Acta Mathematica Vietnamica 22.1 (1997): 289-355.
‣ Horst, R., and Ng V. Thoai. "DC programming: overview." Journal of Optimization Theory and Applications 103.1 (1999): 1-43.
‣ Tuy, H. "On global optimality conditions and cutting plane algorithms." Journal of optimization theory and applications 118.1 (2003): 201-216.
‣ Yuille, Alan L., and Anand Rangarajan. "The concave-convex procedure."Neural Computation 15.4 (2003): 915-936.
Tuesday, November 12, 2013
References
‣ Salakhutdinov, Ruslan, Sam Roweis, and Zoubin Ghahramani. "On the convergence of bound optimization algorithms." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2002.
‣ Hartman, Philip. "On functions representable as a difference of convex functions." Pacific J. Math 9.3 (1959): 707-713.
‣ Hiriart-Urruty, J-B. "Generalized Differentiability/Duality and Optimization for Problems Dealing with Differences of Convex Functions." Convexity and duality in optimization. Springer Berlin Heidelberg, 1985. 37-70.
Tuesday, November 12, 2013