Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 ›...

34
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Lijun Xu Optimization Group Meeting November 27, 2012 By I. Necoara, Y. Nesterov, and F. Glineur

Transcript of Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 ›...

Page 1: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Lijun Xu Optimization Group Meeting

November 27, 2012

By I. Necoara, Y. Nesterov, and F. Glineur

Page 2: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Outline

Introduction Randomized Block (i,j) Coordinate Descent

Method RCD Method in Strongly Convex Case Random Pairs Sampling Extensions Numerical experiment

Page 3: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Coordinate Descent Method consider Q: How to choose ? a) cyclic. (difficult to prove convergence) b) maximal descent. convergence rate is trivial (worse than simple

Gradient Method in general) c) random. (faster, simpler, robust, distributed

and parallel, etc. )

Introduction

Page 4: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Introduction

• Randomized(block) coordinate descent methods

a) The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010)[1].

b) The extension to composite functions was given by Richtárik and Takáč (2011)[2]

[1] Y. Nesterov, Efficiency of coordinate descent methods on huge-scale optimization problems, Core Discussion Paper, 2010. [2] P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, submitted to Mathematical Programming, 2011.

Page 5: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Problem formulation

• Minimize a separable convex objective function

with linearly coupled constraints. • Extension to problems with non-separate objective

function and general linear constraints.

Page 6: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Motivation of Formulation

Applications in •Resource allocation in economic systems, •Distributed computer systems, •Traffic equilibrium problems, •Network flow, etc.

Dual problem corresponding to an optimization of a sum of convex functions.

Finding a point in the intersection of some convex sets.

Page 7: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Notations

• (2.1) becomes

* *

1* * *

1 1* *

[1 ]

: 0 0

( ) ( ( ) ,..., ( ) ) ( ,..., )

( ) ( )

N

ii

T T T T TN N

i i j j N

KKT Ux x

f x U f x f xf x f x i j

λ λ λ=

= ⇔ =

∇ = ⇔ ∇ ∇ =

⇔∇ =∇ ∀ ≠ ∈

:

min ( ) s.t. 0Nnxf x Ux

∈=

Page 8: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Notations • Consider the subspace • It’s orthogonal complement

• Define extended norm induced by G: (for the gradients), Cauchy-Schwartz inequality:

Page 9: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Notations

• Partition of the identity matrix

0

0

n n

i n n

n n

U I

×

×

×

=

----i th entry 1

, ,

( ) ( ), .

Nn

i i ii

ni i

x U x x

f U fα α α=

= ∈

= ∈

1

1

( )

( ) ( )

N

i i ii

N

i i ii

x x d U x d

f x f x d

+

=

+

=

= + = +

= +

∑, n

i ix d ∈

Page 10: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Basic Assumption • All are convex , • are Lipschitz continuous(with Lipschitz

constants ), i.e.:

• Graph (V,E) is undirected and connected, with N notes V={1,…,N}.

use as chosen coordinates.

0iL >

if

if∇

( , )i j E∈

Page 11: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Randomized Block (i,j) Coordinate Descent Method

• Recall

• Choose randomly a pair with probability

• Define

1 1

1

min ( ) ( )+ + ( )

s.t. 0.Nn N Nx

N

f x f x f x

x x∈

=

+ + =

( , )i j E∈( ) 0ij jip p= >

Page 12: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Consider feasibility of i.e. we require . • Minimize the right hand side adding feasibility

• Get the following decrease in f

Randomized Block (i,j) Coordinate Descent Method

0i jd d+ =

Page 13: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Randomized Block (i,j) Coordinate Descent Method

• Each iteration: compute only , full gradient methods: . • depends on random variable: • Define the expected value:

Page 14: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Key Inequality :

• where

Randomized Block (i,j) Coordinate Descent Method

(0 ( )( ))T T Nn Nnij N i j i j nG e e e e I ×= + − − ⊗ ∈

Page 15: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Introduce the distance

which measures the size of the level set of f given by .

• Convergence results:

Randomized Block (i,j) Coordinate Descent Method

0x

Page 16: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Proof convexity : and key inequality: obtain take expectation in (denoting ),

Randomized Block (i,j) Coordinate Descent Method

Page 17: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Design of the probability • Uniform probabilities:

• Dependent on the Lipschitz constants:

• Design the probability since

Page 18: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Recall convergence rate: Idea: searching for to optimize . i.e. is assumed constant such that for .

Design of the probability

Page 19: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Using the relaxation from semidefinite programming:

Design of the probability

where , and are multipliers in Lagrange Relaxation.

2 2 21( , , )T

NR R R=

Page 20: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Note

• Convergence rate under designed probability

Design of the probability

Page 21: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Comparison with full gradient method

1 1 1

11

1

11 1

NL L L

NL L

LN

L L L N N

− − −

−−

−− −×

=

• Consider a particular case: a) a complete graph b) probability ,

• upper bound (BCD method)

Page 22: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Full gradient method

similarly, (full)

(random)

Comparison with full gradient method

Page 23: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Strongly Convex Case • Strongly convex w.r.t with convexity

parameter

and key inequality:

minimizing over x

Page 24: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Similarly, choose the optimal probability by solving the following SDP:

Strongly Convex Case

Page 25: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Rate of convergence in probability

• The proof use a similar reasoning as Theorem 1 in [14] and is

derived from Markov inequality. [14] P. Richtarik and M. Takac, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, submitted to Mathematical Programming, 2011.

Page 26: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Rate of convergence in probability

Page 27: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Random pairs sampling • method needs to choose a pair of

coordinates at each iteration. • So we need a fast procedure to generate

random pairs. • Given probability distribution redefine into a indices vector such that:

then divide [0,1] into subintervals:

( , )( ) i jRCD

( , )i j

| |pn E=

pn

Remark :

Page 28: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Clearly, the width of interval equals the probability ,

• Sampling Algorithm Description

Random pairs sampling

l

l li jp

Page 29: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Generalizations

Extension of to more than one pair. ( , )( ) i jRCD

The same rate of convergence will be obtained for as previous sections.

( )MRCD

Page 30: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Extension of to nonseparable objective functions with general equality constraints.

has component-wise Lipschitz continuous gradient:

Generalizations ( , )( ) i jRCD

f

Page 31: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Assuming

Generalizations

arg mini i j jA s A s+

Page 32: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

• Similar convergence rate:

• Similar choosing the probability:

Generalizations

Page 33: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Google Problem

Goal:

Page 34: Randomized Coordinate Descent Methods on Optimization ... › ~optimization › L1 › optseminar... · a) The first analysis of this method, when applied to the problem of minimizing

Thank you!