Convex Optimization

42
Convex Optimization Muhammad Adil Raja Introduction Convex Sets Convex Functions Convex Optimization Problems References C ONVEX OPTIMIZATION Muhammad Adil Raja Roaming Researchers, Inc. August 31, 2014

description

Convex Optimization

Transcript of Convex Optimization

Page 1: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX OPTIMIZATION

Muhammad Adil Raja

Roaming Researchers, Inc.

August 31, 2014

Page 2: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

OUTLINE

INTRODUCTION

CONVEX SETS

CONVEX FUNCTIONS

CONVEX OPTIMIZATION PROBLEMS

REFERENCES

Page 3: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

OUTLINE

INTRODUCTION

CONVEX SETS

CONVEX FUNCTIONS

CONVEX OPTIMIZATION PROBLEMS

REFERENCES

Page 4: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

OUTLINE

INTRODUCTION

CONVEX SETS

CONVEX FUNCTIONS

CONVEX OPTIMIZATION PROBLEMS

REFERENCES

Page 5: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

OUTLINE

INTRODUCTION

CONVEX SETS

CONVEX FUNCTIONS

CONVEX OPTIMIZATION PROBLEMS

REFERENCES

Page 6: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

OUTLINE

INTRODUCTION

CONVEX SETS

CONVEX FUNCTIONS

CONVEX OPTIMIZATION PROBLEMS

REFERENCES

Page 7: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

INTRODUCTION I

I Many situations arise in machine learning where wewould like to optimize the value of some function.

I That is, given a function f : Rn → R, we want to findx ∈ Rn that minimizes (or maximizes) f (x).

I In the general case, finding the global optimum of afunction can be a very difficult task.

I However, for a special class of optimizationproblems, known as convex optimization problems,we can efficiently find the global solution in manycases.

I “efficiently” has both practical and theoreticalconnotations:

1. It means that we can solve many real-world problemsin a reasonable amount of time.

Page 8: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

INTRODUCTION II

2. And it means that theoretically we can solveproblems in time that depends only polynomially onthe problem size.

I The goal of these section notes and theaccompanying lecture is to give a very brief overviewof the field of convex optimization.

Page 9: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS I

DEFINITION

A set C is convex if, for any x, y ∈ C and θ ∈ < with0 ≤ θ ≤ 1, θx + (1− θ)y ∈ C.

I Intuitively, this means that if we take any twoelements in C, and draw a line segment betweenthese two elements, then every point on that linesegment also belongs to C.

I Figure 1 shows an example of one convex and onenon-convex set.

I The point θx + (1− θ)y is called a convexcombination of the points x and y.

Page 10: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS II

(a) (b)

Figure 1: Examples of a convex set (a) and a non-convex set (b).

2.1 Examples

• All of Rn. It should be fairly obvious that given any x, y ! Rn, !x + (1 " !)y ! Rn.

• The non-negative orthant, Rn+. The non-negative orthant consists of all vectors in

Rn whose elements are all non-negative: Rn+ = {x : xi # 0 $i = 1, . . . , n}. To show

that this is a convex set, simply note that given any x, y ! Rn+ and 0 % ! % 1,

(!x + (1 " !)y)i = !xi + (1 " !)yi # 0 $i.

• Norm balls. Let & · & be some norm on Rn (e.g., the Euclidean norm, &x&2 =!"ni=1 x2

i ). Then the set {x : &x& % 1} is a convex set. To see this, suppose x, y ! Rn,with &x& % 1, &y& % 1, and 0 % ! % 1. Then

&!x + (1 " !)y& % &!x& + &(1 " !)y& = !&x& + (1 " !)&y& % 1

where we used the triangle inequality and the positive homogeneity of norms.

• A!ne subspaces and polyhedra. Given a matrix A ! Rm!n and a vector b ! Rm,an a!ne subspace is the set {x ! Rn : Ax = b} (note that this could possibly be emptyif b is not in the range of A). Similarly, a polyhedron is the (again, possibly empty)set {x ! Rn : Ax ' b}, where ‘'’ here denotes componentwise inequality (i.e., all theentries of Ax are less than or equal to their corresponding element in b).1 To provethis, first consider x, y ! Rn such that Ax = Ay = b. Then for 0 % ! % 1,

A(!x + (1 " !)y) = !Ax + (1 " !)Ay = !b + (1 " !)b = b.

Similarly, for x, y ! Rn that satisfy Ax % b and Ay % b and 0 % ! % 1,

A(!x + (1 " !)y) = !Ax + (1 " !)Ay % !b + (1 " !)b = b.

1Similarly, for two vectors x, y ! Rn, x ( y denotes that each element of X is greater than or equal to thecorresponding element in b. Note that sometimes ‘%’ and ‘#’ are used in place of ‘'’ and ‘(’; the meaningmust be determined contextually (i.e., both sides of the inequality will be vectors).

2

FIGURE: Examples of a convex set (a) and a non-convex set.

Examples

I All of Rn. It should be fairly obvious that given anyx , y ∈ <n, θx + (1− θ)y ∈ <n.

I The non-negative orthant, <n+. The non-negative

orthant consists of all vectors in <n whose elementsare all non-negative:

Page 11: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS III

1. <n+ = x : xi ≥ 0∀i = 1, ...,n.

2. To show that this is a convex set, simply note thatgiven any x , y∀Rn

+ and 0 ≤? ≤ 1,(θx + (1− θ)y)i = θxi + (1− θ)yi ≥ 0∀i .

I Norm balls. Let ||.|| be some norm on <n (e.g., the

Euclidean norm, x2 =√∑n

i=1 x2i ). Then the set

x : ||x || ≤ 1 is a convex set.I To see this,suppose x , y ∈ Rn, with||x || ≤ 1, ||y || ≤ 1,and 0 ≤ θ ≤ 1. Then||θx + (1− θ)y || ≤ ||θx ||+ ||(1− θ)y || =θ||x ||+ (1− θ)||y || ≤ 1where we used the triangle inequality and thepositive homogeneity of norms.

Page 12: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS IV

I Affine subspaces and polyhedra. Given a matrixA ∈ <m×n and a vector b ∈ <m, an affine subspace isthe set x ∈ <n : Ax = b (note that this could possiblybe empty if b is not in the range of A). Similarly, apolyhedron is the (again, possibly empty) setx ∈ Rn : Ax � b, where “�” here denotescomponentwise inequality (i.e., all the entries of Axare less than or equal to their corresponding elementin b). To prove this, first consider x , y ∈ <n such thatAx = Ay = b. Then for 0 ≤ θ ≤ 1,A(θx+(1−θ)y) = θAx+(1−θ)Ay = θb+(1?θ)b = b.Similarly,for x , y ∈ <n that satisfy Ax ≤ b andAy ≤ band 0 ≤ θ ≤ 1,A(θx+(1−θ)y) = θAx+(1−θ)Ay ≤ θb+(1−θ)b = b.

Page 13: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS V

I Intersections of convex sets. SupposeC1,C2,...,Ck are convex sets. Then their intersection⋂k

i=1 Ci = x : x ∈ Ci∀i = 1, ..., k is also a convex set.To see this, consider x , y ∈ ⋂k

i=1 Ci and 0 ≤? ≤ 1.Then, θx + (1− θ)y ∈ Ci∀i = 1, ..., k by the definitionof a convex set. Therefore θx + (1− θ)y ∈ ⋂k

i=1 Ci .Note, however, that the union of convex sets ingeneral will not be convex.

I Positive semidefinite matrices. The set of allsymmetric positive semidefinite matrices, often timescalled the positive semidefinite cone and denotedSn+, is a convex set (in general, Sn ⊂ <n×n denotes

the set of symmetric n ? n matrices). Recall that amatrix A ∈ <n×n is symmetric positive semidefinite ifand only if A = AT and for all x ∈ Rn, xT Ax ≥ 0. Nowconsider two symmetric positive semidefinite

Page 14: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX SETS VI

matrices A,B ∈ Sn+and0 ≤ θ ≤ 1. Then for any

x ∈ <n,xT (θA + (1− θ)B)x = θxT Ax + (1− θ)xT Bx ≥ 0.The same logic can be used to show that the sets ofall positive definite, negative definite, and negativesemidefinite matrices are each also convex.

Page 15: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX FUNCTIONS I

DEFINITION

A function f : <n → < is convex if its domain (denotedD(f)) is a convex set, and if, for all x , y ∈ D(f ) andθ ∈ R,0 ≤? ≤ 1, f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y).

I Intuitively, the way to think about this definition is thatif we pick any two points on the graph of a convexfunction and draw a straight line between them, thenthe portion of the function between these two pointswill lie below this straight line. This situation ispictured in Figure 2.

I We say a function is strictly convex if Definition 2holds with strict inequality for x? 6= y and 0 < θ < 1.We say that f is concave if – f is convex, and likewisethat f is strictly concave if – f is strictly convex.

Page 16: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX FUNCTIONS II

Figure 2: Graph of a convex function. By the definition of convex functions, the line con-necting two points on the graph must lie above the function.

3.1 First Order Condition for Convexity

Suppose a function f : Rn ! R is di!erentiable (i.e., the gradient3 "xf(x) exists at allpoints x in the domain of f). Then f is convex if and only if D(f) is a convex set and forall x, y # D(f),

f(y) $ f(x) + "xf(x)T (y % x).

The function f(x) + "xf(x)T (y % x) is called the first-order approximation to thefunction f at the point x. Intuitively, this can be thought of as approximating f with itstangent line at the point x. The first order condition for convexity says that f is convex ifand only if the tangent line is a global underestimator of the function f . In other words, ifwe take our function and draw a tangent line at any point, then every point on this line willlie below the corresponding point on f .

Similar to the definition of convexity, f will be strictly convex if this holds with strictinequality, concave if the inequality is reversed, and strictly concave if the reverse inequalityis strict.

Figure 3: Illustration of the first-order condition for convexity.

3Recall that the gradient is defined as "xf(x) # Rn, ("xf(x))i = !f(x)!xi

. For a review on gradients andHessians, see the previous section notes on linear algebra.

4

FIGURE: Graph of a convex function. By the definition ofconvex functions, the line connecting two points on the graphmust lie above the function.

Page 17: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

FIRST ORDER CONDITION FOR CONVEXITY I

I Suppose a function f : <n → < is differentiable.I The f is convex if and only if D(f ) is a convex set and

for all x , y ∈ D(f ), f (y) ≥ f (x) +5x f (x)T (y − x).I The function 5x f (x)T (y − x). is called the first-order

approximation to the function f at the point x .I Intuitively, this can be thought of as approximating f

with its tangent line at the point x .I The first order condition for convexity says that f is

convex if and only if the tangent line is a globalunderestimator of the function f .

I In other words, if we take our function and draw atangent line at any point, then every point on this linewill lie below the corresponding point on f .

I A function is differentiable if the gradient 5f (x) existsat all points x in the domain of f .

Page 18: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

FIRST ORDER CONDITION FOR CONVEXITY II

I The gradient is defined as5x f (x) ∈ <n, (5x f (x))i =

∂f (x)∂(x)i

.

I Similar to the definition of convexity, f will be strictlyconvex if this holds with strict inequality, concave ifthe inequality is reversed, and strictly concave if thereverse inequality is strict.

Page 19: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

FIRST ORDER CONDITION FOR CONVEXITY III

Figure 2: Graph of a convex function. By the definition of convex functions, the line con-necting two points on the graph must lie above the function.

3.1 First Order Condition for Convexity

Suppose a function f : Rn ! R is di!erentiable (i.e., the gradient3 "xf(x) exists at allpoints x in the domain of f). Then f is convex if and only if D(f) is a convex set and forall x, y # D(f),

f(y) $ f(x) + "xf(x)T (y % x).

The function f(x) + "xf(x)T (y % x) is called the first-order approximation to thefunction f at the point x. Intuitively, this can be thought of as approximating f with itstangent line at the point x. The first order condition for convexity says that f is convex ifand only if the tangent line is a global underestimator of the function f . In other words, ifwe take our function and draw a tangent line at any point, then every point on this line willlie below the corresponding point on f .

Similar to the definition of convexity, f will be strictly convex if this holds with strictinequality, concave if the inequality is reversed, and strictly concave if the reverse inequalityis strict.

Figure 3: Illustration of the first-order condition for convexity.

3Recall that the gradient is defined as "xf(x) # Rn, ("xf(x))i = !f(x)!xi

. For a review on gradients andHessians, see the previous section notes on linear algebra.

4

FIGURE: Illustration od the first-order condition for convexity.

Page 20: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SECOND ORDER CONDITION FOR CONVEXITY

I

I Suppose a function f : <n → < is twice differentiable.I Then f is convex if and only if D(f ) is a convex set

and its Hessian is positive semidefinite: i.e., for anyx ∈ D(f ), 52

x f (x) � 0.I Here, the notation “�” when used in conjunction with

matrices refers to positive semidefiniteness, ratherthan componentwise inequality.

I In one dimension, this is equivalent to the conditionthat the second derivative f ′′(x) always be positive(i.e., the function always has positive curvature).

Page 21: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SECOND ORDER CONDITION FOR CONVEXITY

II

I Again analogous to both the definition and first orderconditions for convexity, f is strictly convex if itsHessian is positive definite, concave if the Hessian isnegative semidefinite, and strictly concave if theHessian is negative definite.

I A function is twice differentiable if the Hessian52

x f (x) is defined for all points x in the domain of f .I The Hessian is defined as52

x f (x) ∈ <n×n, (52x f (x))ij =

∂2f (x)∂xi∂xj

.

I For a symmetric matrix X ∈ Sn,X � 0 denotes thatX is negative semidefinite.

I As with inequalities, “≤” and “≥”.

Page 22: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

JENSEN’S INEQUALITY I

I Suppose we start with the inequality in the basicdefinition of a convex functionf (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y)for0 ≤ θ ≥ 1.

I Using induction, this can be fairly easily extended toconvex combinations of more than one point,f (∑k

i=1 θixi) ≤∑k

i=1 θi f (xi) for∑k

i=1 θi = 1, θi ≥ 0∀i .I In fact, this can also be extended to infinite sums or

integrals.I In the latter case, the inequality can be written as

f (∫

p(x)xdx) ≤∫

p(x)f (x)dx for∫p(x)dx = 1,p(x) ≥ 0∀x .

I Because p(x) integrates to 1, it is common toconsider it as a probability density, in which case theprevious equation can be written in terms ofexpectations, f (E [x ]) ≤ E [f (x)].

I This last inequality is known as JensenÕs inequality.

Page 23: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SUBLEVEL SETS I

I Convex functions give rise to a particularly importanttype of convex set called an α-sublevel set.

I Given a convex function f : <n → < and a realnumber α ∈ <, the α-sublevel set is defined asx ∈ D(f ) : f (x) ≤ α.

I In other words, the α-sublevel set is the set of allpoints x such that f (x) ≤ α.

I To show that this is a convex set, consider anyx , y ∈ D(f ) such that f (x) ≤ α and f (y) ≤ α.

I Then f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y) ≤θα+ (1− θ)α = α.

Page 24: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

EXAMPLES OF CONVEX FUNCTIONS I

I We begin with a few simple examples of convexfunctions of one variable, then move on tomultivariate functions.

I Exponential. Let f : < → <, f (x) = eax for anya ∈ <. To show f is convex, we can simply take thesecond derivative f ′′(x) = a2eax , which is positive forall x .

I Negative logarithm. Let f : < → <, f (x) =?logx withdomain D(f ) = R++.

I Here, R++ denotes the set of strictly positive realnumbers, x : x > 0.

I Then f ′′(x) = 1/x2 > 0 for all x .I Affine functions. Let f : <n→ <, f (x) = bT x + c for

some b ∈ <n, c ∈ <.I In this case the Hessian, 52xf (x) = 0 for all x .

Page 25: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

EXAMPLES OF CONVEX FUNCTIONS II

I Because the zero matrix is both positive semidefiniteand negative semidefinite, f is both convex andconcave.

I In fact, affine functions of this form are the onlyfunctions that are both convex and concave.

I Quadratic functions. Letf : <n → <, f (x) = 1

2xT Ax + bT x + c for a symmetricmatrix A ∈ Sn, b ∈ <n and c ∈ <.

I The Hessian for this function is given by52xf (x) = A.

I Therefore, the convexity or non-convexity of f isdetermined entirely by whether or not A is positivesemidefinite: if A is positive semidefinite then thefunction is convex (and analogously for strictlyconvex, concave, strictly concave).

Page 26: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

EXAMPLES OF CONVEX FUNCTIONS III

I If A is indefinite then f is neither convex nor concave.I Note that the squared Euclidean norm

f (x) = ||x ||22 = xT x is a special case of quadraticfunctions where A = I,b = 0, c = 0, so it is thereforea strictly convex function.

I Norms. Let f ;<n → < be some norm on <n. Then bythe triangle inequality and positive homogeniety ofnorms, for x , y ∈ <n,0 ≤ θ ≤ 1, f (θx + (1− θ)y) ≤f (θx) + f (1− θ)y) = θf (x) + (1− θ)f (y).

I This is an example of a convex function where it isnot possible to prove convexity based on the secondor first order conditions.

I The reason is that because norms are not generallydifferentiable everywhere (e.g., the 1-norm,||x ||1 =

∑ni=1 |xi |, is non-differentiable at all points

where any xi is equal to zero).

Page 27: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

EXAMPLES OF CONVEX FUNCTIONS IV

I Nonnegative weighted sums of convex functions.Let f1, f2, ..., fk be convex functions and w1,w2, ...,wkbe nonnegative real numbers. Thenf (x) =

∑ki=1 wi fi(x) is a convex function, since

f (θx + (1?θ)y) =∑k

i=1 wi fi(θx + (1− θ)y) (1)

≤∑ki=1 wi(θfi(x) + (1− θ)fi(y)) (2)

= θ∑k

i=1 wi fi(x) + (1− θ)∑ki=1 wi fi f (y)(3)

= θf (x) + (1− θ)f (x) (4)

Page 28: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX OPTIMIZATION PROBLEMS I

I Armed with the definitions of convex functions andsets, we are now equipped to consider convexoptimization problems.

I Formally, a convex optimization problem in anoptimization problem of the form minimize f (x)subject to x ∈ C.

I where f is a convex function, C is a convex set, andx is the optimization variable.

I However, since this can be a little bit vague, we oftenwrite it often written as minimize f (x) subject togi(x) ≤ 0, i = 1, ...,m hi(x) = 0, i = 1, ...,p

I where f is a convex function, gi are convex functions,and hi are affine functions, and x is the optimizationvariable.

Page 29: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX OPTIMIZATION PROBLEMS II

I It is important to note the direction of theseinequalities:

1. A convex function gi must be less than zero.2. This is because the 0-sublevel set of gi is a convex

set, so the feasible region, which is the intersectionof many convex sets, is also convex (recall that affinesubspaces are convex sets as well).

3. If we were to require that gi ≥ 0 for some convex gi ,the feasible region would no longer be a convex set,and the algorithms we apply for solving theseproblems would not longer be guaranteed to find theglobal optimum.

4. Also notice that only affine functions are allowed tobe equality constraints.

5. Intuitively, you can think of this as being due to thefact that an equality constraint is equivalent to thetwo inequalities hi ≥ 0 and hi ≥ 0.

Page 30: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

CONVEX OPTIMIZATION PROBLEMS III

6. However, these will both be valid constraints if andonly if hi is both convex and concave, i.e., hi must beaffine.

I The optimal value of an optimization problem isdenoted p∗ (or sometimes f ∗) and is equal to theminimum possible value of the objective function inthe feasible region. p∗ =minf (x) : gi(x) ≤ 0, i = 1, ...,m,hi(x) = 0, i = 1, ...,p.

I We allow p∗ to take on the values +∞ and −∞ whenthe problem is either infeasible (the feasible region isempty) or unbounded below (there exists feasiblepoints such that f (x)→ −∞), respectively.

I We say that x∗ is an optimal point if f (x?) = p∗.I Note that there can be more than one optimal point,

even when the optimal value is finite.

Page 31: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

GLOBAL OPTIMALITY IN CONVEX PROBLEMS I

I Defining the concepts of local optima and globaloptima are in order.

I Intuitively, a feasible point is called locally optimal ifthere are no “nearby” feasible points that have alower objective value.

I Similarly, a feasible point is called globally optimal ifthere are no feasible points at all that have a lowerobjective value.

I To formalize this a little bit more, following definitionsare important.

Page 32: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

GLOBAL OPTIMALITY IN CONVEX PROBLEMS

II

DEFINITION

A point x is locally optimal if it is feasible (i.e., it satisfiesthe constraints of the optimization problem) and if thereexists some R > 0 such that all feasible points z with||x − z||2 ≤ R, satisfy f (x) ≤ f (z).

DEFINITION

A point x is globally optimal if it is feasible and for allfeasible points z, f (x) ≤ f (z).

I We now come to the crucial element of convexoptimization problems, from which they derive mostof their utility.

Page 33: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

GLOBAL OPTIMALITY IN CONVEX PROBLEMS

III

I The key idea is that for a convex optimizationproblem all locally optimal points are globally optimal.

I LetÕs give a quick proof of this property bycontradiction.

I Suppose that x is a locally optimal point which is notglobally optimal, i.e., there exists a feasible point ysuch that f (x) > f (y).

I By the definition of local optimality, there exist nofeasible points z such that||x?z||2 ≤ Randf (z) < f (x).

Page 34: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

GLOBAL OPTIMALITY IN CONVEX PROBLEMS

IV

I But now suppose we choose the pointz = θy + (1− θ)x with θ = R

2 ||x?y ||2 Then

||x − z||2 = ||x − ( R2||x−y ||2 y + (1− R

2||x−y ||2)x)||2 (5)

= || R2||x−y ||2 (x − y)||2 (6)

= R/2 ≤ R. (7)

I In addition, by the convexity of f we havef (z) = f (θy + (1?θ)x) ≤ θf (y) + (1?θ)f (x) < f (x).

I Furthermore, since the feasible set is a convex set,and since x and y are both feasible z = θy + (1?θ)will be feasible as well.

I Therefore, z is a feasible point, with |x?z|2 < R andf (z) < f (x).

Page 35: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

GLOBAL OPTIMALITY IN CONVEX PROBLEMS

V

I This contradicts our assumption, showing that xcannot be locally optimal.

Page 36: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SPECIAL CASES OF CONVEX PROBLEMS I

I For a variety of reasons, it is often times convenientto consider special cases of the general convexprogramming formulation.

I For these special cases we can often deviseextremely efficient algorithms that can solve verylarge problems, and because of this you will probablysee these special cases referred to any time peopleuse convex optimization techniques.

I Linear Programming. We say that a convexoptimization problem is a linear program (LP) if boththe objective function f and inequality constraints giare affine functions.

I In other words, these problems have the formminimize cT x + d subject toGx � h.

Page 37: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SPECIAL CASES OF CONVEX PROBLEMS II

I Ax = b where x ∈ <n is the optimization variable,c ∈ <n, d ∈ <, G ∈ <m×n, h ∈ <m, A ∈ <p?n, b ∈ Rp

are defined by the problem, and “�” denoteselementwise inequality.

I Quadratic Programming. We say that a convexoptimization problem is a quadratic program (QP) ifthe inequality constraints gi are still all affine, but ifthe objective function f is a convex quadraticfunction.

I In other words, these problems have the form,minimize 1

2xT Px + cT x + d subject to Gx � hAx = b. where again x ∈ <n is the optimizationvariable, c ∈ <n, d ∈ <, G ∈ <m×n, h ∈ Rm,A ∈ <p×n, b ∈ Rp are defined by the problem, but wealso have P ∈ Sn

+, a symmetric positive semidefinitematrix.

Page 38: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SPECIAL CASES OF CONVEX PROBLEMS III

I Quadratically Constrained QuadraticProgramming. We say that a convex optimizationproblem is a quadratically constrained quadraticprogram (QCQP) if both the objective f and theinequality constraints gi are convex quadraticfunctions, minimize 1

2xT Px + cT x + d subject to12xT Qix + rT

i x + si ≤ 0, i = 1, ...,m Ax = b where, asbefore, x ∈ <n is the optimization variable, c ∈ <n,d ∈ <, A ∈ <p×n, b ∈ <p, P ∈ Sn

+, but we also haveQi ∈ Sn

+, ri ∈ <n, si ∈ < ,for i = 1,...,m.I Semidefinite Programming has become more and

more prevalent in many different areas of machinelearning research, so you might encounter these atsome point, and it is good to have an idea of whatthey are.

Page 39: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SPECIAL CASES OF CONVEX PROBLEMS IV

I We say that a convex optimization problem is asemidefinite program (SDP) if it is of the formminimizetr(CX ) subject to tr(AiX ) = bi , i = 1, ...,pX � 0.

I where the symmetric matrix X ∈ Sn is theoptimization variable, the symmetric matrices C,A1, ...,Ap ∈ Sn are defined by the problem, and theconstraint X � 0 means that we are constraining Xto be positive semidefinite.

I This looks a bit different than the problems we haveseen previously, since the optimization variable isnow a matrix instead of a vector.

Page 40: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

SPECIAL CASES OF CONVEX PROBLEMS V

I It should be fairly obvious from the definitions thatquadratic programs are more general than linearprograms (since a linear program is just a specialcase of a quadratic program where P = 0), andlikewise that quadratically constrained quadraticprograms are more general than quadratic programs.

I However, what is not obvious at all is thatsemidefinite programs are in fact more general thanall the previous types.

I That is, any quadratically constrained quadraticprogram (and hence any quadratic program or linearprogram) can be expressed as a semidefinteprogram.

I

Page 41: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

EXAMPLES OF CONVEX OPTIMIZATION I

I Support Vector Machines.I Constrained Least Squares.I Maximum likelihood for Logistic Regression.

Page 42: Convex Optimization

ConvexOptimization

Muhammad AdilRaja

Introduction

Convex Sets

ConvexFunctions

ConvexOptimizationProblems

References

REFERENCES I

I The images and inspiration for preparing these slideshave been taken from Andrew Ngs course onMachine Learning.

I The course is available on Stanford engineering forall and Coursera.