Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic...

20
Stochastic Optimization for Large-scale Optimal Transport (OT) Sixiong You Mechanical and Aerospace Engineering Department

Transcript of Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic...

Page 1: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

Stochastic Optimization forLarge-scale Optimal Transport (OT)

Sixiong You

Mechanical and Aerospace Engineering Department

Page 2: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

2

I. Introduction

• Motivation• Optimal transport (OT) defines a powerful framework to compare

probability distributions in a geometrically faithful way.• Previous works are purely discrete and cannot cope with continuous

densities, The only known class of methods that can overcome thislimitation are so-called semi-discrete solvers.

• In addition, the practical impact of OT is still limited because of itscomputational burden

• This paper propose a new class of stochastic optimization algorithmsto cope with large-scale OT problems.

Page 3: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

3

I. Introduction

• This paper introduces three kinds of stochastic optimization methods to cope with three possible settings:

• Discrete OT: compare a discrete vs. another discrete measure Stochastic averaged gradient (SAG) method• Semi-discrete OT: compare a discrete vs. a continuous measure Averaged stochastic gradient descent (SGD)• Continous OT: to compare a continuous vs. another continuous

measure makes use of an expansion of the dual variables in a reproducing

kernel Hilbert space (RKHS)

Page 4: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

4

II. Problem Formulation

The definition of joint probability measures

Presenter
Presentation Notes
Search for the minimum-time path for a UAV flying through a specified area with fixed starting and ending points and multiple avoidance zones, including circular and elliptical zones.
Page 5: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

5

II. Problem Formulation

The definition of Kullback-Leibler divergence

Presenter
Presentation Notes
Search for the minimum-time path for a UAV flying through a specified area with fixed starting and ending points and multiple avoidance zones, including circular and elliptical zones.
Page 6: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

6

II. Problem FormulationThe Kantorovich formulation of OT and its entropic regularization can be written in a single convex optimization problem

In which is the cost to move a unit of mass from x to y( , )c x y

Page 7: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

7

II. Problem Formulation

Fenchel-Rockafellar’s dual theorem

Solving ,plugging this expression back in ,( , )0, 0 cF u v u vu

εεε ∂> = ⇒ =

∂( )D ε

Page 8: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

8

II. Problem Formulation

Page 9: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

9

III. Discrete Optimal Transport

Stochastic gradient descent (SGD): the gradient of that term can be used as a proxy for the full gradient in a standard gradient ascent step to maximize

Page 10: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

10

III. Discrete Optimal TransportStochastic gradient descent (SGD)

Example: For an optimization problem

( )1

1min min ( )n

ii

J Q Qnω ω

ω ω=

= = ∑Objective function

When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations :

( ) ( )1

: /n

iQ Q nω ω η ω ω η ω

=

= − ∇ = − ∇∑

where is a step size. However, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very effective in the case of large-scale machine learning problems.

𝜂𝜂

Page 11: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

11

III. Discrete Optimal TransportStochastic gradient descent (SGD)

In stochastic (or "on-line") gradient descent, the true gradient of isapproximated by a gradient at a single example

𝑄𝑄 𝜔𝜔

( ): iQω ω η ω= − ∇

In pseudocode, stochastic gradient descent can be presented as follows:

Page 12: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

12

III. Discrete Optimal TransportAveraged stochastic gradient descent (Average SGD)

Invented independently by Ruppert and Polyak in the late 1980s, is ordinarystochastic gradient descent that records an average of its parameter vectorover time. That is, the update is the same as for ordinary stochastic gradientdescent, but the algorithm also keeps track of

1

0

1 t

iit

ω ω−

=

= ∑

When optimization is done, this averaged parameter vector takes the place of 𝜔𝜔

Stochastic averaged gradient (SAG): the stochastic average gradientmethod with a (user-supplied) constant step size.

Page 13: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

13

III. Discrete Optimal Transport

Stepsize

Initial gi =0

Update gradient

The flow chat of SAG for discrete OT

Output

Page 14: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

14

III. Discrete Optimal TransportNumerical Illustrations on Bags of Word-Embeddings

Page 15: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

15

IV. Semi-Discrete Optimal Transport

Stepsize

Update output

The flow chat of SGD for discrete OT

Output

Page 16: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

16

IV. Semi-Discrete Optimal TransportNumerical Illustrations

Page 17: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

17

IV. Continuous Optimal Transport

Page 18: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

18

IV. Continuous Optimal Transport

Stepsize C and Kernels

Update output

The flow chat of Kernel SGD for discrete OT

Page 19: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

19

IV. Continuous Optimal TransportNumerical Illustrations

Page 20: Stochastic Optimization for Large-scale Optimal Transport (OT) · 2018. 3. 27. · Stochastic Optimization for Large-scale Optimal Transport (OT) ... • This paper propose a new

20

Thank You!