Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization...

31
Optimization Algorithms Mini-batch gradient descent deeplearning.ai

Transcript of Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization...

Page 1: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Mini-batchgradient descentdeeplearning.ai

Page 2: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Batch vs. mini-batch gradient descentVectorization allows you to efficiently compute on m examples.

Page 3: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Mini-batch gradient descent

Page 4: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Understanding mini-batch

gradient descentdeeplearning.ai

Page 5: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Training with mini batch gradient descent

# iterations

cost

Batch gradient descent

mini batch # (t)

cost

Mini-batch gradient descent

Page 6: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Choosing your mini-batch size

Page 7: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Choosing your mini-batch size

Page 8: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.
Page 9: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Page 10: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Page 11: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Understanding exponentially

weighted averagesdeeplearning.ai

Page 12: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Exponentially weighted averages

days

temperature

!" = $!"%& + (1 − $),"

Page 13: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Exponentially weighted averages

!"## = 0.9!(( + 0.1+"##!(( = 0.9!(, + 0.1+((!(, = 0.9!(- + 0.1+(,

!/ = 0!/1" + (1 − 0)+/

Page 14: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Implementing exponentially weighted averages!" = 0!% = &!" + (1 − &)-%

!/ = &!% + (1 − &)-/!0 = &!/ + (1 − &)-0

Page 15: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Bias correctionin exponentially

weighted averagedeeplearning.ai

Page 16: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Bias correction

days

temperature

!" = $!"%& + (1 − $),"

Page 17: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Gradient descent with momentumdeeplearning.ai

Page 18: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Gradient descent example

Page 19: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Implementation details

!"# = %!"# + 1 − % )*!"+ = %!"+ + 1 − % ),* = * − -!"#,

Hyperparameters: -, %

Oniteration8:Compute )*, ),on the current mini-batch

, = , − -!"+

% = 0.9

Page 20: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

RMSpropdeeplearning.ai

Page 21: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

RMSprop

Page 22: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Adam optimizationalgorithmdeeplearning.ai

Page 23: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Adam optimization algorithm

yhat = np.array([.9, 0.2, 0.1, .4, .9])

Page 24: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Hyperparameters choice:

Adam Coates

Page 25: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

Learning rate decaydeeplearning.ai

Page 26: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Learning rate decay

Page 27: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Learning rate decay

Page 28: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Other learning rate decay methods

Page 29: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Optimization Algorithms

The problem oflocal optimadeeplearning.ai

Page 30: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Local optima in neural networks

Page 31: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.

Andrew Ng

Problem of plateaus

• Unlikely to get stuck in a bad local optima• Plateaus can make learning slow