Practical Session on Convex Optimization: Convex Analysisschmidtm/MLSS/convex.pdf · Practical...

Practical Session on Convex Optimization: ConvexAnalysis

Mark Schmidt

INRIA/ENS

September 2011

Mark Schmidt MLSS 2011 Convex Analysis

Motivation: Properties of Convex Functions

Two key properties of convex functions:

All local minima are global minima.

Global rate of convergence analysis.

Convexity: Zero-order condition

A real-valued function is convex if

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y),

for all x, y ∈ Rn and all 0 ≤ θ ≤ 1.

Function is below the chord from x to y .

Show that all local minima are global minima.

Convexity: Zero-order condition

A real-valued function is convex if

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y),

for all x, y ∈ Rn and all 0 ≤ θ ≤ 1.

Function is below the chord from x to y .

Show that all local minima are global minima.

Exercise: Convexity of Norms

A real-valued function f is a norm if:

1 f (x) ≥ 0, f (0) = 0.

2 f (θx) = |θ|f (x).

3 f (x + y) ≤ f (x) + f (y).

Show that norms are convex.

1 Use triangle inequality then homogeneity.

Exercise: Convexity of Norms

A real-valued function f is a norm if:

1 f (x) ≥ 0, f (0) = 0.

2 f (θx) = |θ|f (x).

3 f (x + y) ≤ f (x) + f (y).

Show that norms are convex.

1 Use triangle inequality then homogeneity.

Strict Convexity

A real-valued function is strictly convex if

f (θx + (1− θ)y) < θf (x) + (1− θ)f (y),

for all x, y ∈ Rn and all 0 < θ < 1.

Function is strictly below the chord between x to y .

Show that global minimum of strictly convex function isunique.

Strict Convexity

A real-valued function is strictly convex if

f (θx + (1− θ)y) < θf (x) + (1− θ)f (y),

for all x, y ∈ Rn and all 0 < θ < 1.

Function is strictly below the chord between x to y .

Show that global minimum of strictly convex function isunique.

Convexity: First-order condition

A real-valued differentiable function is convex iff

f (x) ≥ f (y) +∇f (y)T (x − y),

for all x, y ∈ Rn.

The function is globally above the tangent at y .

Show that any stationary point is a global minimum.

Convexity: First-order condition

A real-valued differentiable function is convex iff

f (x) ≥ f (y) +∇f (y)T (x − y),

for all x, y ∈ Rn.

The function is globally above the tangent at y .

Show that any stationary point is a global minimum.

Exercise: Zero- to First-Order Condition

Show that zero-order condition,

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y),

implies first-order condition,

f (x) ≥ f (y) +∇f (y)T (x − y).

1 Use: θx + (1− θ)y = y + θ(x − y).

2 Use:

∇f (y)Td = limθ→0

f (y + θd)− f (y)

Exercise: Zero- to First-Order Condition

Show that zero-order condition,

f (θx + (1− θ)y) ≤ θf (x) + (1− θ)f (y),

implies first-order condition,

f (x) ≥ f (y) +∇f (y)T (x − y).

1 Use: θx + (1− θ)y = y + θ(x − y).

2 Use:

∇f (y)Td = limθ→0

f (y + θd)− f (y)

Convexity: Second-order condition

A real-valued twice-differentiable function is convex iff

∇2f (x) � 0

for all x ∈ Rn.

The function is flat or curved upwards in every direction.

A real-valued function f is a quadratic if it can be written in theform:

f (x) = xTAx + bT x + c .

Show sufficient conditions for a quadratic function to be convex.

Convexity: Second-order condition

A real-valued twice-differentiable function is convex iff

∇2f (x) � 0

for all x ∈ Rn.

The function is flat or curved upwards in every direction.

A real-valued function f is a quadratic if it can be written in theform:

f (x) = xTAx + bT x + c .

Show sufficient conditions for a quadratic function to be convex.

Exercise: Convexity of Basic Functions

Show that the following are convex:

1 f (x) = exp(ax)

2 f (x) = x log x (for x > 0)

3 f (x) = aT x

4 f (x) = ||x ||2

5 f (x) = maxi{xi}

Other Examples of Convex Functions

Some other notable convex functions:

1 f (x , y) = log(ex + ey )

2 f (X ) = log det X (for X positive-definite).

3 f (x ,Y ) = xTY−1x (for Y positive-definite)

Operations that Preserve Convexity

1 Non-negative weighted sum:

f (x) = θ1f1(x) + θ2f2(x) + · · ·+ θnfn(x).

2 Composition with affine mapping:

g(x) = f (Ax + b).

3 Pointwise maximum:

f (x) = max{fi (x)}.

Show that least-residual problems are convex for any `p-norm:

f (x) = ||Ax − b||p

Show that SVMs are convex:

f (x) = ||x ||2 + Cn∑

max{0, 1− biaTi x}.

Operations that Preserve Convexity

1 Non-negative weighted sum:

f (x) = θ1f1(x) + θ2f2(x) + · · ·+ θnfn(x).

2 Composition with affine mapping:

g(x) = f (Ax + b).

3 Pointwise maximum:

f (x) = max{fi (x)}.

Show that least-residual problems are convex for any `p-norm:

f (x) = ||Ax − b||p

Show that SVMs are convex:

f (x) = ||x ||2 + Cn∑

max{0, 1− biaTi x}.

Motivation: Properties of Convex Functions

Two key properties of convex functions:

All local minima are global minima.

Global rate of convergence analysis.

Convergence Rate: Strongly-Convex Functions

Assume that f is a twice-differentiable, where for all x we have

µI � ∇f (x) � LI ,

for some µ > 0 and L <∞.

By Taylor’s theorem, for any x and y we have

f (y) = f (x) +∇f (x)T (y − x) +1

2(y − x)T∇2f (z)(y − x),

for some z .

Assume that f is a twice-differentiable, where for all x we have

µI � ∇f (x) � LI ,

for some µ > 0 and L <∞.

By Taylor’s theorem, for any x and y we have

f (y) = f (x) +∇f (x)T (y − x) +1

2(y − x)T∇2f (z)(y − x),

for some z .

From the previous slide, we get for all x and y that

f (y) ≤ f (x) +∇f (x)T (y − x) +L

2||y − x ||2,

f (y) ≥ f (x) +∇f (x)T (y − x) +µ

2||y − x ||2.

Use these to show that the gradient iteration

xk+1 = xk − (1/L)∇f (xk),

has the linear convergence rate

f (xk)− f (x∗) ≤ (1− µ/L)k [f (x0)− f (x∗)].

Use this result to get a convergence rate on ||xk − x∗||.Show that if µ = 0 we get the sublinear rate O(1/k).

f (y) ≤ f (x) +∇f (x)T (y − x) +L

2||y − x ||2,

f (y) ≥ f (x) +∇f (x)T (y − x) +µ

2||y − x ||2.

xk+1 = xk − (1/L)∇f (xk),

f (xk)− f (x∗) ≤ (1− µ/L)k [f (x0)− f (x∗)].

f (y) ≤ f (x) +∇f (x)T (y − x) +L

2||y − x ||2,

f (y) ≥ f (x) +∇f (x)T (y − x) +µ

2||y − x ||2.

xk+1 = xk − (1/L)∇f (xk),

f (xk)− f (x∗) ≤ (1− µ/L)k [f (x0)− f (x∗)].

Use this result to get a convergence rate on ||xk − x∗||.

Show that if µ = 0 we get the sublinear rate O(1/k).

f (y) ≤ f (x) +∇f (x)T (y − x) +L

2||y − x ||2,

f (y) ≥ f (x) +∇f (x)T (y − x) +µ

2||y − x ||2.

xk+1 = xk − (1/L)∇f (xk),

f (xk)− f (x∗) ≤ (1− µ/L)k [f (x0)− f (x∗)].

References

Most of this lecture is based on material from Boyd andVandenberghe’s very good ”Convex Optimization” book, as well astheir online notes.

Practical Session on Convex Optimization: Convex Analysisschmidtm/MLSS/convex.pdf · Practical...

Documents

Transcript of Practical Session on Convex Optimization: Convex Analysisschmidtm/MLSS/convex.pdf · Practical...

Convex Optimization · convex optimization, i.e., to develop the skills and background needed to recognize, formulate, and solve convex optimization problems. Developing a working

A combination of concave/convex surfaces ... - Digital CSICdigital.csic.es/bitstream/10261/101875/1/concave,convex.pdf · A combination of concave/convex surfaces for ﬁeld-enhancement

Introduction to Convex Optimization - CityU CScheewtan/CS8292Class/Lec1.pdf · Introduction to Convex Optimization Chee Wei Tan CS8292 : Advanced Topics in Convex Optimization and

CS599: Convex and Combinatorial Optimization Fall …shaddin/cs599fa13/slides/lec9-10.pdfCS599: Convex and Combinatorial Optimization Fall 2013 Lecture 9: Convex Optimization Problems

Chapter 4 Convex Optimization

Approximating Parameterized Convex Optimization Problemstheinf2.informatik.uni-jena.de/theinf2_multimedia/Publications/... Approximating Parameterized Convex Optimization Problems

Convex Optimization - IHMC

Convex Optimization Theory - Summary

Convex Optimization - Mark Schmidt - CMPT 419/726schmidtm/Documents/2013_Notes_ConvexOp… · Convex FunctionsSmooth OptimizationNon-Smooth OptimizationStochastic Optimization Convex

Convex Optimization and Modeling - Saarland University€¦ · Convex Optimization and Modeling Convex Optimization Fourth lecture, 05.05.2010 Jun.-Prof. Matthias Hein

Convex Optimization for Big Data - UBC Computer Science · Convex FunctionsSmooth OptimizationNon-Smooth OptimizationRandomized AlgorithmsParallel/Distributed Optimization Convex

Lecture Notes 7: Convex Optimization - NYU Courantcfgranda/pages/OBDA_fall17/notes/convex...Optimization-based data analysis Fall 2017 Lecture Notes 7: Convex Optimization 1 Convex

Convex Optimization Problems Prof. Daniel P. Palomar · Outline 1 Optimization Problems 2 Convex Optimization 3 Quasi-Convex Optimization 4 Classes of Convex Problems: LP, QP, SOCP,

1426 IEEE JOURNAL ON SELECTED AREAS IN …weiyu/convex.pdf · 1426 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 24, NO. 8, AUGUST 2006 An Introduction to Convex Optimization

CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, …

Convex optimization user guide

Convex Optimization & Lagrange Duality

Convex Optimization Part I

Introduction to Convex Optimization

ROBUST CONVEX OPTIMIZATION - CiteSeer