Lecture 3 Optimization Problems and Iterative AlgorithmsLecture 3 Optimization Problems and...

Lecture 3

Optimization Problems and Iterative Algorithms

January 13, 2016

This material was jointly developed with Angelia Nedic at UIUC for IE 598ns

Uday V. Shanbhag Lecture 3

Outline

• Special Functions: Linear, Quadratic, Convex

• Criteria for Convexity of a Function

• Operations Preserving Convexity

• Unconstrained Optimization

• First-Order Necessary Optimality Conditions

• Constrained Optimization

• First-Order Necessary Optimality Conditions

• KKT Conditions

• Iterative Algorithms

Stochastic Optimization 1


Convex Functionf is convex when dom(f) is convex set and there holds

f(αx+ (1− α)y) ≤ αf(x) + (1− α)f(y)

for all x, y ∈ dom(f) and α ∈ [0,1]

strictly convex if the inequality is strict for all x, y ∈ dom(f) & α ∈ (0,1)

Note that dom(f) is defined as

dom(f) , {x : f(x) < +∞} .



x

f (x)

yx

f (x)

y

f (y)

f is concave when −f is convex

f is strictly concave when −f is strictly convex



Examples of Convex/Concave FunctionsExamples on RConvex:

• Affine: ax+ b over R for any a, b ∈ R• Exponential: eax over R for any a ∈ R• Power: xp over (0,+∞) for p ≥ 1 or p ≤ 0

• Powers of absolute value: |x|p over R for p ≥ 1

• Negative entropy: x lnx over (0,+∞)

Concave:

• Affine: ax+ b over R for any a, b ∈ R• Powers: xp over (0,+∞) for 0 ≤ p ≤ 1

• Logarithm: lnx over (0,+∞)

Examples on Rn

• Affine functions are both convex and concave

• Norms ‖x‖, ‖x‖1, ‖x‖∞ are convex



Second-Order Conditions for Convexity

• Let f be twice differentiable and let dom(f) be the domain of f

[In general, when differentiability is considered, it is required that dom(f) is open]

• The Hessian ∇2f(x) is a symmetric n× n matrix whose entries are the

second-order partial derivatives of f at x:

[∇2f(x)

]ij=∂2f(x)

∂xi∂xjfor i, j = 1, . . . , n

2nd-order conditions:• f is convex if and only if dom(f) is convex set and

∇2f(x) � 0 for all x ∈ dom(f)

Positive semidefiniteness of a matrix: [Recall that Rn×n 3M � 0 if for all x ∈ Rn, xTMx ≥ 0]

• f is strictly convex if dom(f) is convex set

∇2f(x) � 0 for all x ∈ dom(f)

Positive definiteness of a matrix: [Recall that Rn×n 3M � 0 if for all x ∈ Rn, xTMx > 0]



Examples

• Quadratic function: f(x) = (1/2)x′Qx+ q′x+ r with a symmetric

n× n matrix Q

∇f(x) = Qx+ q, ∇2f(x) = Q

Convex for Q � 0

• Least-squares objective: f(x) = ‖Ax− b‖2 with an m× n matrix A

∇f(x) = 2AT(Ax− b), ∇2f(x) = 2ATA

Convex for any A

• Quadratic-over-linear: f(x, y) = x2/y

∇2f(x, y) = 2y3

[y

−x

] [y

−x

]T� 0

Convex for y > 0



First-Order Condition for Convexity

Let f be differentiable and let dom(f) be its domain. Then, the gradient

∇f(x) =

∂f(x)∂x1∂f(x)∂x2...

∂f(x)∂xn

exists at each x ∈ dom(f)

• 1st-order condition: f is convex if and only if dom(f) is convex and

f(x) +∇f(x)T(z − x) ≤ f(z) for all x, z ∈ dom(f)

• Note: A first order approximation is a global underestimate of f



• Very important property used in convex optimization for algorithm

designs and performance analysis



Operations Preserving ConvexityLet f and g be convex functions over Rn

• Positive Scaling: λf is convex for λ > 0; (λf)(x) = λf(x) for all x

• Sum: f + g is convex; (f + g)(x) = f(x) + g(x) for all x

• Composition with affine function: for g affine [i.e., g(x) = Ax+ b],

the composition f ◦ g is convex, where

(f ◦ g)(x) = f(Ax+ b) for all x

• Pointwise maximum: For convex functions f1, . . . , fm, the pointwise-

max function

h(x) = max {f1(x), . . . , fm(x)} is convex

• Polyhedral function: f(x) = maxi=1,...,m(aTi x+ bi) is convex

• Pointwise supremum: Let Y ⊆ Rm and f : Rn × Rm → R. Let

f(x, y) be convex in x for each y ∈ Y . Then, the supremum function

over the set Y

h(x) = supy∈Y f(x, y) is convex



Optimization TerminologyLet C ⊆ Rn and f : C → R. Consider the following optimization problem

minimize f(x)

subject to x ∈ C

Example: C = {x ∈ Rn | g(x) ≤ 0, x ∈ X}Terminology:

• The set C is referred to as feasible set

• We say that the problem is feasible when C is nonempty

• The problem is unconstrained when C = Rn, and it is constrained

otherwise

• We say that a vector x∗ is optimal solution or a global minimum when

x∗ is feasible and the value f(x∗) is not exceeded at any x ∈ C, i.e.,

x∗ ∈ Cf(x∗) ≤ f(x) for all x ∈ C



Local Minimum

minimize f(x)

subject to x ∈ C

• A vector x is a local minimum for the problem if x ∈ C and there is a

ball B(x, r) such that

f(x) ≤ f(x) for all x ∈ C with ‖x− x‖ ≤ r

• Every global minimum is also a local minimum

•When the set C is convex and the function f is convex then a local

minimum is also global



First-Order Necessary Optimality Condition:

Unconstrained ProblemLet f be a differentiable function with dom(f) = Rn and let C = Rn.

• If x is a local minimum of f over Rn, then the following holds:

∇f(x) = 0

• The gradient relation can be equivalently given as:

(y − x)′∇f(x) ≥ 0 for all y ∈ Rn

This is a variational inequality V I(K,F ) with the set K and the

mapping F given by

K = Rn, F (x) = ∇f(x)

• Solving a minimization problem can be reduced to solving a correspond-

ing variational inequality



First-Order Necessary Optimality Condition:

Constrained Problem

Let f be a differentiable function with dom(f) = Rn and let C ⊆ Rn be a

closed convex set.

• If x is a local minimum of f over C, then the following holds:

(y − x)′∇f(x) ≥ 0 for all y ∈ C (1)

Again, this is a variational inequality V I(K,F ) with the set K and

the mapping F given by

K = C, F (x) = ∇f(x)• Recall that when f is convex, then a local minimum is also global

•When f is convex: the preceding relation is also sufficient for x to be

a global minimum i.e.,

if x satisfies relation (1), then x is a (global) minimum



Equality and Inequality Constrained Problem

Consider the following problem

minimize f(x)

subject to h1(x) = 0, . . . , hp(x) = 0

g1(x) ≤ 0, . . . , gm(x) ≤ 0

where f , hi and gj are continuously differentiable over Rn.

Def. For a feasible vector x, an active set of (inequality) constraints is

the set given by

A(x) = {j | gj(x) = 0}

If j 6∈ A(x), we say that the j-th constraint is inactive at x

Def. We say that a vector x is regular if the gradients

∇h1(x), . . . ,∇hp(x), and ∇gj(x) for j ∈ A(x)are linearly independent

NOTE: x is regular when there are no equality constraints, and all the

inequality constrains are inactive [p = 0 and A(x) = ∅]



Lagrangian Function

With the problem

minimize f(x)

subject to h1(x) = 0, . . . , hp(x) = 0

g1(x) ≤ 0, . . . , gm(x) ≤ 0

(2)

we associate the Lagrangian function L(x, λ, µ) defined by

L(x, λ, µ) = f(x) +p∑

i=1

λihi(x) +m∑j=1

µjgj(x)

where λi ∈ R for all i, and µj ∈ R+ for all j



First-Order Karush-Kuhn-Tucker (KKT)

Necessary Conditions

Th. Let x be a local minimum of the equality/inequality constrained

problem (2). Also, assume that x is regular. Then, there exist unique

multipliers λ and µ such that

• ∇xL(x, λ, µ) = 0 [L is the Lagrangian function]

• µj ≥ 0 for all j

• µj = 0 for all j 6∈ A(x)

The last condition is referred to as complementarity conditions

We can compactly write them as:

g(x) ⊥ µ



In fact, the complementarity-based formulation can be used to write the

first-order optimality conditions more compactly. Consider the following

constrained optimization problem:

minimize f(x)

subject to c1(x) ≥ 0...

cm(x) ≥ 0

≥ 0.

Then, if x is regular, then there exists multipliers λ such that

0 ≤ x ⊥ ∇xf(x)−∇xc(x)T λ ≥ 0 (3)

0 ≤ λ ⊥ c(x) ≥ 0 (4)

More succinctly, this is a nonlinear complementarity problem, denoted by



CP (Rm+n, F ), a problem that requires a z that satisfies

0 ≤ z ⊥ F (z) ≥ 0,

where

z ,

(x

λ

)and F (z) ,

(∇xf −∇xcTλ

c(x)

).



Second-Order KKT Necessary Conditions

Th. Let x be a local minimum of the equality/inequality constrained

problem (2). Also, assume that x is regular and that f, hi, gj are twice

continuously differentiable. Then, there exist unique multipliers λ and µ

such that

• ∇xL(x, λ, µ) = 0

• µj ≥ 0 for all j

• µj = 0 for all j 6∈ A(x)

• For any vector y such that ∇hi(x)′y = 0 for all i and ∇gj(x)′y = 0

for all j ∈ A(x), the following relation holds:

y′∇2xxL(x, λ, µ)y ≥ 0



Solution Procedures: Iterative Algorithms

For solving problems, we will consider iterative algorithms

• Given an initial iterate x0

•We generate a new iterate

xk+1 = Gk(xk)

where Gk is a mapping that depends on the optimization problem

Objectives:

• Provide necessary conditions on the mappings Gk that yield a sequence

{xk} converging to a solution of the problem of interest

• Study how fast the sequence {xk} converges:

• Global convergence rate (when far from optimal points)

• Local convergence rate (when near an optimal point)



Gradient Descent MethodConsider continuously differentiable function f . We want to

minimize f(x) over x ∈ Rn

Gradient descent methodxk+1 = xk − αk∇f(xk)

• The scalar αk is a stepsize: αk > 0• The stepsize choices αk = α, or line search, or other stepsize rule so

that f(xk+1) < f(xk)

Convergence Rate:

• Looking at the tail of an error e(xk) = dist(xk, X∗) sequence:

where

dist(x,A) , {d(x, a) : a ∈ A} .

Local convergence is at the best linear

lim supk→∞

e(xk+1)

e(xk)≤ q for some q ∈ (0,1)



• Global convergence is also at the best linear



Newton’s Method

Consider twice continuously differentiable function f with Hessian∇2f(x) �0 for all x. We want to solve the following problem:

minimize {f(x) : x ∈ Rn}

Newton’s method

xk+1 = xk − αk∇2f(xk)−1∇f(xk)

Local Convergence Rate (near x∗)

• ‖∇f(x)‖ converges to zero quadratically:

‖∇f(xk)‖ ≤ C q2k

for all large enough k

where C > 0 and q ∈ (0,1)



Penalty Methods

For solving inequality constrained problems:

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . ,m

Penalty Approach: Remove the constraints but penalize their violation

Pc : minimize F (x, c) = f(x)+cP (g1(x), . . . , gm(x)) over x ∈ Rn

where c > 0 is a penalty parameter and P is some penalty function

Penalty methods operate in two stages for c and x, respectively

• Choose initial value c0

(1) Having ck, solve the problem Pck to obtain its optimal x∗(ck)

(2) Using x∗(ck), update ck to obtain ck+1 and go to step 1



Q-Rates of Convergence

Let {xk} be a sequence in Rn that converges to x∗

Convergence is said to be:

1. Q-linear if ∃r ∈ (0,1) such that‖xk+1−x∗‖‖xk−x∗‖

≤ r for k > K.

Example: (1 + 0.5k) converges Q-linearly to 1.

2. Q-quadratic if ∃M such that‖xk+1−x∗‖‖xk−x∗‖2

≤M for k > K.

Example: (1 + 0.52k) converges Q-quadratically to 1.

3. Q-superlinear if ∃r ∈ (0,1) such that limk→∞‖xk+1−x∗‖‖xk−x∗‖

= 0

Example: (1 + k−k) converges Q-superlinearly to 1.

4. Q-quadratically =⇒ Q-superlinearly =⇒ Q-linearly



Example 1

f(x, y) = x2 + y2

1. Steepest descent from

(−1−1

)

2. Newton from

(−1−1

)

3. Newton from

(−11

)



−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y

Figure 1: Well Conditioned Function:Steepest, Newton, Newton



Example 2

f(x, y) = 0.1x2 + y2

1. Steepest descent from

(−1−1

)

2. Newton from

(−1−1

)

3. Newton from

(−11

)



x

y

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

xy

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

y

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Figure 2: Ill-Conditioned Function: Steepest, Newton, Newton



Interior-Point Methods

Solve inequality (and more generally) constrained problem:

minimize f(x)

subject to gj(x) ≤ 0, j = 1, . . . ,m

The IPM solves a sequence of problems parametrized by t > 0:

minimize f(x)−1

t

m∑j=1

ln(−gj(x)) over x ∈ Rn

• Can be viewed as a penalty method with

• Penalty parameter c = 1t

• Penalty function

P (u1, . . . , um) = −m∑j=1

ln(−uj)

This function is known as logarithmic barrier or log barrier function



References for this lecture

The material for this lecture:

• (B) Bertsekas D.P. Nonlinear Programming

• Chapter 1 and Chapter 3 (descent and Newton’s methods, KKT

conditions)

• (FP) Facchinei and Pang Finite Dimensional ..., Vol I (Part on Com-

plementarity Problems)

• Chapter 1 for Normal Cone, Dual Cone, and Tangent Cone

• (BNO) Bertsekas, Nedic, Ozdaglar Convex Analysis and Optimization

• Chapter 1 (convex functions)


Lecture 3 Optimization Problems and Iterative AlgorithmsLecture 3 Optimization Problems and...

Documents

Transcript of Lecture 3 Optimization Problems and Iterative AlgorithmsLecture 3 Optimization Problems and...