Massively scalable Sinkhorn distances via the Nyström method

J. Altschuler, F. Bach, J. Niles-Weed, A. Rudi

Optimal Transport

Increasingly popular tool in machine learning

Kusner et al. (2015)

Day 18

Day 0low probability high probabilitytrajectory trajectory

Neural

Day 0MEF

Stromal

Trophoblast

Schiebinger et al. (2018)

Feydy et al. (2017)

Gulrajani et al. (2017)

Definitions

p, q ∈ Δn

two probability distributions on points in .

x1, …, xn ∈ ℝd

∥xi∥ ≤ R ∀i

Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑

Pij∥xi − xj∥2

set of couplings between and p qℳ(p, q) := P :P ∈ ℝn×n

+P1 = pP⊤1 = q

Slow to compute

Sinkhorn distance

Wasserstein distance between and p qW(p, q) := minP∈ℳ(p,q) ∑

Pij∥xi − xj∥2

[Cuturi ’13] : Sinkhorn “distance”Wη(p, q) := minP∈ℳ(p,q) ∑

Pij∥xi − xj∥2 − η−1H(P)

H(P ) =X

Pij log1

Slow to compute — timeO(n3)

Faster to compute — timeO(n2)

Can we do better?

Our contribution

New algorithm (NYS-SINK) approximates Sinkhorn distance in time and spaceO(n)

Guarantees automatically adaptive to low-dimensional structure

Outperforms standard approaches by orders of magnitude

Minimizer: unique matrix in of formℳp,q

Why Sinkhorn distance?

positive diagonal matrices

entrywise exponential

∈ ℳp,qe−η∥xi−xj∥2

Sinkhorn scalingEasy iterative algorithm to find minimizer

1. Rescale rows to match 2. Rescale columns to match 3. Repeat

Converges! [Sinkhorn ’67]

Goal: find (unique) Pη = D1e−η∥xi−xj∥2D2 ∈ ℳp,q

… in time! [Altschuler, Weed, Rigollet ’17]O(n2ϵ−2)

Prior analysis [AWR ’17]

O(1)iterations

x O(n2)per iteration

(matrix-vector product)

= O(n2)time

· ·e−η∥xi−xj∥2

Idea #1

O(1)iterations

x O(n)per iteration

= O(n)time

· · low rank approximation

(via Nyström method)

Idea #1

O(1)iterations

x O(n)per iteration

= O(n)time

Rigorous analysis needs new stability guarantees for Sinkhorn scaling

low rank approximation

Idea #1

Works well, but error guarantee depends on ambient dimension (could be large)

Idea #2

Choose rank adaptively and pay only for intrinsic dimension

Idea #2

Choose rank adaptively and pay only for intrinsic dimension

Rigorous analysis needs new interpolation guarantees for Gaussian kernel

Experiments

Comparison with existing approaches ( approximation rank, iterations)r : T :

Experiments

comparable results

Experiments

orders-of-magnitude speedup

Experiments

Comparison with Sinkhorn scaling ( desired accuracy)ε :

Experiments

linear vs. quadratic runtime

Experimentsmuch larger problems solvable

in memory

Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...

Transcript of Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...

Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...

Documents

Transcript of Massively scalable Sinkhorn distances via the Nyström method · 2019. 10. 25. · Massively...

1 The wonderful world of LOGISTICS STEFAN NYSTRÖM.

Massively Parallel Algorithms

Turn bottles to pure profit - ACG Nyström

CCCOnline massively open online

Innovation & culture | Jesper T Nyström | LTG-35

Risk Management and Schedule Management E. Keith Sinkhorn Department of Engineering 21 February 2007.

Massively Scalable Applications - TechFerry

The Massively Multiplayer Gaming Explosionwscacchi/GameIndustry/IBM-MMOG... · The Massively Multiplayer Gaming Explosion ... Mobile The Massively Multiplayer Gaming Explosion ...

Sinkhorn Label Allocation: Semi-Supervised Classification ...

Massively Parallel Computationcourses.compute.dtu.dk/02289/2020/massivelyparallelcomputation/... · Computational Model • Massively Parallel Computation (MPC) model. • P processors

Den ultimata läroboken - GUPEA: Home · Perspektiv på historien 1b (2011) skriven av Hans Nyström, Lars Nyström och Örjan Nyström beskriver sig ha haft ambitionen att väva

Sinkhorn EM: An Expectation-Maximization algorithm based ... · Sinkhorn EM: An Expectation-Maximization algorithm based on entropic optimal transport Gonzalo Mena 1, Amin Nejatbakhsh

Massively Parallel Architectures

Massively Distributed Database Systems

MIND Massively Interconnected NoDe

Mapping in a Cycle: Sinkhorn Regularized ... - ECVA€¦ · Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes Lei Yang 1, Wenxi Liu2;, Zhiming

Making Large-Scale Nyström Approximation Possible

Massively Multiplayer Journalism

Mimblewimble: Private, Massively-Prunable Blockchainsmimblewimble.cash › 20160808-ScalingBitcoinMilan2016.pdf · Mimblewimble: Private, Massively-Prunable Blockchains Andrew Poelstra

Massively Sharded MySQL - O'Reilly Mediaassets.en.oreilly.com/1/event/74/Massively Sharded MySQL at Tumblr... · Massively Sharded MySQL Divide a table by relocating sets of rows