Semi-Stochastic Gradient Descent Methods

19
Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh

description

Semi-Stochastic Gradient Descent Methods. Jakub Kone čný (joint work with Peter Richt árik ) University of Edinburgh. Introduction. Large scale problem setting. Problems are often structured Frequently arising in machine learning. is BIG. Structure – sum of functions. Examples. - PowerPoint PPT Presentation

Transcript of Semi-Stochastic Gradient Descent Methods

Page 1: Semi-Stochastic  Gradient Descent Methods

Semi-Stochastic Gradient Descent Methods

Jakub Konečný (joint work with Peter Richtárik)University of Edinburgh

Page 2: Semi-Stochastic  Gradient Descent Methods

Introduction

Page 3: Semi-Stochastic  Gradient Descent Methods

Large scale problem setting Problems are often structured

Frequently arising in machine learning

Structure – sum of functions is BIG

Page 4: Semi-Stochastic  Gradient Descent Methods

Examples Linear regression (least squares)

Logistic regression (classification)

Page 5: Semi-Stochastic  Gradient Descent Methods

Assumptions Lipschitz continuity of derivative of

Strong convexity of

Page 6: Semi-Stochastic  Gradient Descent Methods

Gradient Descent (GD) Update rule

Fast convergence rate

Alternatively, for accuracy we need

iterations

Complexity of single iteration – (measured in gradient evaluations)

Page 7: Semi-Stochastic  Gradient Descent Methods

Stochastic Gradient Descent (SGD) Update rule

Why it works

Slow convergence

Complexity of single iteration – (measured in gradient evaluations)

a step-size parameter

Page 8: Semi-Stochastic  Gradient Descent Methods

Goal

GD SGD

Fast convergence

gradient evaluations in each iteration

Slow convergence

Complexity of iteration independent of

Combine in a single algorithm

Page 9: Semi-Stochastic  Gradient Descent Methods

Semi-Stochastic Gradient Descent

S2GD

Page 10: Semi-Stochastic  Gradient Descent Methods

Intuition

The gradient does not change drastically We could reuse the information from “old”

gradient

Page 11: Semi-Stochastic  Gradient Descent Methods

Modifying “old” gradient Imagine someone gives us a “good” point

and

Gradient at point , near , can be expressed as

Approximation of the gradient

Already computed gradientGradient changeWe can try to estimate

Page 12: Semi-Stochastic  Gradient Descent Methods

The S2GD Algorithm

Simplification; size of the inner loop is random, following a geometric rule

Page 13: Semi-Stochastic  Gradient Descent Methods

Theorem

Page 14: Semi-Stochastic  Gradient Descent Methods

Convergence rate

How to set the parameters ?

Can be made arbitrarily small, by decreasing

For any fixed , can be made arbitrarily small by increasing

Page 15: Semi-Stochastic  Gradient Descent Methods

Setting the parameters

The accuracy is achieved by setting

Total complexity (in gradient evaluations)# of epochs

full gradient evaluation cheap iterations

# of epochs

stepsize

# of iterations

Fix target accuracy

Page 16: Semi-Stochastic  Gradient Descent Methods

Complexity S2GD complexity

GD complexity iterations complexity of a single iteration Total

Page 17: Semi-Stochastic  Gradient Descent Methods

Related Methods SAG – Stochastic Average Gradient

(Mark Schmidt, Nicolas Le Roux, Francis Bach, 2013) Refresh single stochastic gradient in each iteration Need to store gradients. Similar convergence rate Cumbersome analysis

MISO - Minimization by Incremental Surrogate Optimization (Julien Mairal, 2014) Similar to SAG, slightly worse performance Elegant analysis

Page 18: Semi-Stochastic  Gradient Descent Methods

Related Methods SVRG – Stochastic Variance Reduced Gradient

(Rie Johnson, Tong Zhang, 2013) Arises as a special case in S2GD

Prox-SVRG(Tong Zhang, Lin Xiao, 2014) Extended to proximal setting

EMGD – Epoch Mixed Gradient Descent(Lijun Zhang, Mehrdad Mahdavi , Rong Jin, 2013) Handles simple constraints, Worse convergence rate

Page 19: Semi-Stochastic  Gradient Descent Methods

Experiment Example problem, with