Peter Richtárik (joint work with Martin Takáč) Distributed Coordinate Descent Method AmpLab All...

Peter Richtárik (joint work with Martin Takáč)

Distributed Coordinate Descent Method

AmpLab All Hands Meeting - Berkeley - October 29, 2013

Randomized Coordinate Descent

Find the minimizer

2D OptimizationContours of a function

a2 =b2

Randomized Coordinate Descent in 2D

a2 =b2

67SOLVED!

Convergence of Randomized Coordinate Descent

Strongly convex f

Smooth or ‘simple’ nonsmooth f‘difficult’ nonsmooth f

Focus on d

(big data = big d)

Parallelization Dream

Serial Parallel

In reality we get something in between

How (not) to ParallelizeCoordinate Descent

“Naive” parallelization

Do the same thing as before, but with more or all coordinates

and add up the updates

Failure of naive parallelization

Idea: averaging updates may help

SOLVED!

Averaging can be too conservative

and so on...

Averaging may be too conservative

BAD!!!

Minimizing Regularized Loss

Convex (smooth)

Convex (smooth or nonsmooth)- separable- allow

Loss Regularizer

Regularizer: examples

No regularizer Weighted L1 norm

Weighted L2 normBox constraints

e.g., SVM dual

e.g., LASSO

Structure of f

Considered in [BKBG, ICML 2011]

Loss: examples

Quadratic loss

L-infinity

L1 regression

Exponential loss

Logistic loss

Square hinge loss

BKBG’11RT’11bTBRS’13RT ’13a

FR’13

Distributed CoordinateDescent Method

I. Distribution of Datad = # features / variables / coordinates Data matrix

II. Choice of Coordinates

Random set of coordinates (‘sampling’)

III. Computing Updates to Selected Coordinates

Random set of coordinates (‘sampling’)

Current iterate New iterate

Update to i-th coordinate

All nodes need to be able to compute this (communication)

Iteration Complexity

implies

Strong convexity constant of the regularizer

Strong convexity constant of the loss f

Theorem [RT’13]# coordinates

# nodes # coordinates updated / node

Bad partitioning at most doubles # of iterations

spectral norm of the “partitioning”

Theorem [RT’13]

Experiment 1

1 node (c = 1)

LASSO problemn = 2 billions d = 1 billion

Coordinate Updates

Iterations

Wall Time

Experiment 2

128 nodes (c = 512, 4096 cores)

LASSO problemn = 1 billion d = 0.5 billion

data size = 3 TB

LASSO: 3TB data + 128 nodes

Peter Richtárik (joint work with Martin Takáč) Distributed Coordinate Descent Method AmpLab All...

Documents

Transcript of Peter Richtárik (joint work with Martin Takáč) Distributed Coordinate Descent Method AmpLab All...

AMPLab Yahoo

descent methods

Descent and Monadicity - Informatics Conferences …conferences.inf.ed.ac.uk/ct2019/slides/5.pdfDescent category Descent theory Monadicity via descent Descent and Monadicity Fernando

Mini-Batch Primal and Dual Methods for SVMsprichtar/talks/TALK-2013-03-20-Paris.pdf · 3/20/2013 · March 20, 2013 1/31. Peter Richtárik, Martin Taká Parallel coordinate descent

Northam’s Avon Descent Association · 2019. 6. 24. · Avon Descent – Event Rules 2019 – Northam’s Avon Descent Association Author: Northam’s Avon Descent Association Subject:

Descent Rules

Continuous Descent - SKYbrary · Continuous Descent - A guide to implementing Continuous Descent 3 The European definition, as approved by stakeholders is: “Continuous Descent Approach

Chapter 21 Kinship and Descent. Chapter Outline What are descent groups? What functions do descent groups serve? How do descent groups evolve?

A Retrospective on AMPLab and the Berkeley Data Analytics Stack

Alluxio Presentation at AMPLab Summer Retreat 2016

DESCENT PREPARATION MANAGED DESCENT T/D · PDF filepf descent phase pnf descent preparation managed descent iaf t/d 80 nm ~10 minutes a320 -version 05a

Presentation1 descent

Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution

Airbus Descent Monitoring - SmartCockpit descent monitoring v1.1 © june 2007 —

ASYNCHRONOUS STOCHASTIC COORDINATE DESCENT · ASYNCHRONOUS STOCHASTIC COORDINATE DESCENT: ... We describe an asynchronous parallel stochastic proximal coordinate descent algo- ...

Descent cinemtography

Learning to learn by gradient descent by gradient descentpapers.nips.cc/...to...descent-by-gradient-descent.pdf · Learning to learn by gradient descent by gradient descent Marcin

Peter Richtárik Coordinate Descent Methods with Arbitrary Sampling Optimization and Statistical Learning – Les Houches – France – January 11-16, 2015.

Big Data Research in the AMPLab: BDAS and Beyond

8. descent