Post on 17-Dec-2015
A KTEC Center of Excellence 1
Convex Optimization: Part 1 of
Chapter 7 Discussion
Presenter: Brian Quanz
A KTEC Center of Excellence 2
About today’s
discussion…•Chapter 7 – no separate discussion
of convex optimization• Discusses with SVM problems
• Instead: • Today: Discuss convex optimization
• Next Week: Discuss some specific convex optimization
problems (from text), e.g. SVMs
A KTEC Center of Excellence 3
About today’s
discussion…•Mostly follow alternate text:
• Convex Optimization, Stephen Boyd and Lieven
Vandenberghe
– Borrowed material from book and related course notes
– Some figures and equations shown here
• Available online: http://www.stanford.edu/~boyd/cvxbook/
• Nice course lecture videos available from Stephen Boyd
online: http://www.stanford.edu/class/ee364a/
• Corresponding convex optimization tool (discuss later) -
CVX: http://www.stanford.edu/~boyd/cvx/
A KTEC Center of Excellence 4
Overview Why convex? What is convex?
Key examples of linear and quadratic programming
Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions
Brief concept of interior point methods
CVX – convex opt. made easy
A KTEC Center of Excellence 5
Mathematical
Optimization• All learning is some optimization problem
-> Stick to canonical form
• x = (x1, x2, …, xp ) – opt. variables ; x*
• f0 : Rp -> R – objective function
• fi : Rp -> R – constraint function
A KTEC Center of Excellence 6
Optimization Example•Well familiar with: regularized
regression• Least squares
• Add some constraints, ridge, lasso
A KTEC Center of Excellence 7
Why convex
optimization?• Can’t solve most OPs
• E.g. NP Hard, even high polynomial time too slow
• Convex OPs• (Generally) No analytic solution
• Efficient algorithms to find (global) solution
• Interior point methods (basically Iterated Newton) can be
used:
– ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f
• At worst solve with general IP methods (CVX), faster
specialized
A KTEC Center of Excellence 8
What is Convex
Optimization?•OP with convex objective and
constraint functions
• f0 , … , fm are convex = convex OP
that has an efficient solution!
A KTEC Center of Excellence 9
Convex Function•Definition: the weighted mean of
function evaluated at any two points
is greater than or equal to the
function evaluated at the weighted
mean of the two points
A KTEC Center of Excellence 10
Convex Function• What does definition mean?
• Pick any two points x, y and evaluate along the
function, f(x), f(y)
• Draw the line passing through the two points
f(x) and f(y)
• Convex if function evaluated on any point along
the line between x and y is below the line
between f(x) and f(y)
A KTEC Center of Excellence 11
Convex Function
A KTEC Center of Excellence 12
Convex Function
Convex!
A KTEC Center of Excellence 13
Convex Function
Not Convex!!!
A KTEC Center of Excellence 14
Convex Function• Easy to see why convexity allows
for efficient solution
• Just “slide” down the objective
function as far as possible and will
reach a minimum
A KTEC Center of Excellence 15
Local Optima is Global (simple proof)
A KTEC Center of Excellence 16
Convex vs. Non-convex
Ex.
•Convex, min. easy to find
Affine – border case of convexity
A KTEC Center of Excellence 17
Convex vs. Non-convex
Ex.
• Non-convex, easy to get stuck in a local min.
• Can’t rely on only local search techniques
A KTEC Center of Excellence 18
Non-convex• Some non-convex problems highly multi-
modal, or NP hard
• Could be forced to search all solutions, or
hope stochastic search is successful
• Cannot guarantee best solution, inefficient
• Harder to make performance guarantees
with approximate solutions
A KTEC Center of Excellence 19
Determine/Prove
Convexity• Can use definition (prove holds) to prove• If function restricted to any line is convex, function is convex
• If 2X differentiable, show hessian >= 0
• Often easier to:• Convert to a known convex OP
– E.g. QP, LP, SOCP, SDP, often of a more general form
• Combine known convex functions (building blocks) using
operations that preserve convexity
– Similar idea to building kernels
A KTEC Center of Excellence 20
Some common convex
OPs•Of particular interest for this book
and chapter: • linear programming (LP) and quadratic programming
(QP)
• LP: affine objective function, affine
constraints
-e.g. LP SVM, portfolio management
A KTEC Center of Excellence 21
LP VisualizationNote: constraints form feasible set-for LP, polyhedra
A KTEC Center of Excellence 22
Quadratic Program• QP: Quadratic objective, affine constraints
• LP is special case
• Many SVM problems result in QP, regression
• If constraint functions quadratic, then
Quadratically Constrained Quadratic Program
(QCQP)
A KTEC Center of Excellence 23
QP Visualization
A KTEC Center of Excellence 24
Second Order Cone
Program
• Ai = 0 - results in LP
• ci = 0 - results in QCQP
•Constraint requires the affine functions
to lie in 2nd order
cone
A KTEC Center of Excellence 25
Second Order Cone (Boundary)
in R3
A KTEC Center of Excellence 26
Semidefinite
Programming
• Linear matrix inequality (LMI)
constraints
• Many problems can be expressed using
LMIs
• LP and SOCP
A KTEC Center of Excellence 27
Semidefinite
Programming
A KTEC Center of Excellence 28
Building Convex
Functions• From simple convex functions to
complex: some operations that
preserve complexity• Nonnegative weighted sum
• Composition with affine function
• Pointwise maximum and supremum
• Composition
• Minimization
• Perspective ( g(x,t) = tf(x/t) )
A KTEC Center of Excellence 29
Verifying Convexity
Remarks• For more detail and expansion, consult the
referenced text, Convex Optimization
• Geometric Programs also convex, can be
handled with a series of SDPs (skipped
details here)
• CVX converts the problem either to SOCP
or SDM (or a series of) and uses efficient
solver
A KTEC Center of Excellence 30
Lagrangian• Standard form:
• Lagrangian L:
• Lambda, nu, Lagrange multipliers (dual variables)
A KTEC Center of Excellence 31
Lagrange Dual Function
• Lagrange Dual found by minimizing
L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and
set = 0 (SVM)
A KTEC Center of Excellence 32
Lagrange Dual Function•Note: Lagrange dual function is the
point-wise infimum of family of
affine functions of (lambda, nu)
• Thus, g is concave even if problem
is not convex
A KTEC Center of Excellence 33
Lagrange Dual Function• Lagrange Dual provides lower
bound on objective value at solution
A KTEC Center of Excellence 34
Lagrangian as Linear Approximation,
Lower Bound
• Simple interpretation of Lagrangian
• Can incorporate the constraints into
objective as indicator functions • Infinity if violated, 0 otherwise:
• In Lagrangian we use a “soft” linear
approximation to the indicator functions; under-
estimator since
0
A KTEC Center of Excellence 35
Lagrange Dual Problem• Why not make the lower bound best possible?
• Dual problem:
• Always convex opt. problem (even when
primal is non-convex)
• Weak Duality: d* <= p* (have already
seen this)
A KTEC Center of Excellence 36
Strong Duality• If d* = p*, strong duality holds
• Does not hold in general
• Slater’s Theorem: If convex problem,
and strictly feasible point exists, then
strong duality holds! (proof too involved, refer to text)
• => For convex problems, can use dual problem to find
solution
A KTEC Center of Excellence 37
Complementary
Slackness• When strong duality holds
• Sandwiched between f0(x), last 2
inequalities are equalities, simple!
(definition)
(since constraints satisfied at x*)
A KTEC Center of Excellence 38
Complementary
Slackness• Which means:
• Since each term is non-positive, we have
complementary slackness:
• Whenever constraint is non-active,
corresponding multiplier is zero
A KTEC Center of Excellence 39
Complementary
Slackness• This can also be described by
• Since usually only a few active
constraints at solution (see geometry),
the dual variable lambda is often sparse • Note: In general no guarantee
A KTEC Center of Excellence 40
Complementary
Slackness• As we will see, this is why support vector
machines result in solution with only key
support vectors• These come from the dual problem, constraints correspond to points,
and complementary slackness ensures only the “active” points are
kept
A KTEC Center of Excellence 41
Complementary
Slackness• However, avoid common misconceptions
when it comes to SVM and complementary
slackness!
• E.g. if Lagrange multiplier is 0, constraint
could still be active! (not bijection!)
• This means:
A KTEC Center of Excellence 42
KKT Conditions • The KKT conditions are then just what
we call that set of conditions required
at the solution (basically list what we
know)
• KKT conditions play important role• Can sometimes be used to find solution analytically
• Otherwise can think of many methods as ways of solving
KKT conditions
A KTEC Center of Excellence 43
KKT Conditions• Again given strong duality and assuming
differentiable, since
gradient must be 0 at x*
• Thus, putting it all together, for non-
convex problems we have
A KTEC Center of Excellence 44
KKT Conditions – non-
convex
• Necessary conditions
A KTEC Center of Excellence 45
KKT Conditions – convex
• Also sufficient
conditions: • 1+2 -> xt is feasible.
• 3 -> L(x,lt,nt) is convex
• 5 -> xt minimizes
L(x,lt,nt) so g(lt,nt) =
L(xt,lt,nt)
A KTEC Center of Excellence 46
Brief description of interior point
method• Solve a series of equality constrained
problems with Newton’s method
• Approximate constraints with log-barrier
(approx. of indicator)
A KTEC Center of Excellence 47
Brief description of interior point
method
• As t gets larger, approximation becomes better
A KTEC Center of Excellence 48
Central Path Idea
A KTEC Center of Excellence 49
CVX: Convex Optimization Made
Easy• CVX is a Matlab toolbox• Allows you to flexibly express convex optimization problems
• Translates these to a general form and uses efficient solver
(SOCP, SDP, or a series of these)
• http://www.stanford.edu/~boyd/cvx/
• All you have to do is design the convex
optimization problem• Plug into CVX, a first version of algorithm implemented
• More specialized solver may be necessary for some
applications
A KTEC Center of Excellence 50
CVX - Examples•Quadratic program: given H, f, A,
and b• cvx_begin
variable x(n)
minimize (x’*H*x + f’*x)
subject to
A*x >= b
cvx_end
A KTEC Center of Excellence 51
CVX - Examples• SVM-type formulation with L1 norm
• cvx_begin variable w(p)
variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) +
l1_lambda*norm(w,1) ) subject to
X*w + by >= a - e;e >= ec;
cvx_end
A KTEC Center of Excellence 52
CVX - Examples• More complicated terms built with expressions
• cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a;cvx_end
A KTEC Center of Excellence 53
Questions•Questions, Comments?
A KTEC Center of Excellence 54
Extra proof