Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization....
Transcript of Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization....
![Page 1: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/1.jpg)
CS109A Introduction to Data SciencePavlos Protopapas and Kevin Rader
Lecture 19: NN Regularization
![Page 2: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/2.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation
§ Sparse Representation
§ Bagging
§ Dropout
2
![Page 3: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/3.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties§ Early Stopping
§ Data Augmentation
§ Sparse Representation
§ Bagging
§ Dropout
3
![Page 4: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/4.jpg)
CS109A, PROTOPAPAS, RADER
Regularization
4
Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.
![Page 5: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/5.jpg)
CS109A, PROTOPAPAS, RADER
Overfitting
5
Fitting a deep neural network with 5 layers and 100 neurons per layer can lead to a very good prediction on the training set but poor prediction on validations set.
![Page 6: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/6.jpg)
CS109A, PROTOPAPAS, RADER
Norm Penalties
We used to optimize:𝐽 𝑊; 𝑋, 𝑦
Change to …𝐽' 𝑊;𝑋, 𝑦 = 𝐽 𝑊; 𝑋, 𝑦 + 𝛼Ω(𝑊)
L2 regularization:– Weights decay– MAP estimation with Gaussian prior
L1 regularization:– encourages sparsity– MAP estimation with Laplacian prior
6
Biasesnotpenalized
Ω 𝑊 =12∥ 𝑊 ∥22
Ω 𝑊 =12∥ 𝑊 ∥3
![Page 7: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/7.jpg)
CS109A, PROTOPAPAS, RADER
Norm Penalties
We used to optimize:𝐽 𝑊; 𝑋, 𝑦
Change to …𝐽' 𝑊;𝑋, 𝑦 = 𝐽 𝑊; 𝑋, 𝑦 + 𝛼Ω(𝑊)
L2 regularization:– Decay of weights– MAP estimation with Gaussian prior
L1 regularization:– encourages sparsity– MAP estimation with Laplacian prior
7
Biasesnotpenalized
Ω 𝑊 =12∥ 𝑊 ∥22
Ω 𝑊 =12∥ 𝑊 ∥3
𝑊(453) = 𝑊(4) − 𝜆𝜕𝐽𝜕𝑊
𝐽' 𝑊;𝑋, 𝑦 = 𝐽 𝑊; 𝑋, 𝑦 +12𝛼𝑊2
𝑊(453) = 𝑊(4) − 𝜆𝜕𝐽𝜕𝑊
− 𝜆𝛼𝑊
weightsdecayin
proportiontoitssize.
![Page 8: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/8.jpg)
CS109A, PROTOPAPAS, RADER
Norm Penalties
8
Ω 𝑊 =12∥ 𝑊 ∥22
Ω 𝑊 =12∥ 𝑊 ∥3
![Page 9: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/9.jpg)
CS109A, PROTOPAPAS, RADER
Norm Penalties as Constraints
Useful if K is known in advance
Optimization:
• Construct Lagrangian and apply gradient descent
• Projected gradient descent
9
min< = >?
𝐽(𝑊;𝑋, 𝑦)
![Page 10: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/10.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping§ Data Augmentation
§ Sparse Representation
§ Bagging
§ Dropout
10
![Page 11: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/11.jpg)
CS109A, PROTOPAPAS, RADER
Early Stopping
11
Training time can be treated as a
hyperparameter
Early stopping: terminate while validation set performance is better
![Page 12: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/12.jpg)
CS109A, PROTOPAPAS, RADER
Early Stopping
12
![Page 13: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/13.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation§ Sparse Representation
§ Bagging
§ Dropout
13
![Page 14: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/14.jpg)
CS109A, PROTOPAPAS, RADER
Data Augmentation
14
![Page 15: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/15.jpg)
CS109A, PROTOPAPAS, RADER
Data Augmentation
15
![Page 16: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/16.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation
§ Sparse Representation§ Bagging
§ Dropout
16
![Page 17: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/17.jpg)
CS109A, PROTOPAPAS, RADER
Sparse Representation
17
𝐽 𝜃; 𝑋, 𝑦
𝑊3
𝑊2
𝑊A
𝑊B
𝑊C
𝑊D
4.34 = 3.2 2.0 1.8 2
−2.21.3
𝑊J 𝑌
𝑊J
![Page 18: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/18.jpg)
CS109A, PROTOPAPAS, RADER
Sparse Representation
18
0.69 = 0.5 .2 0.1 2
−2.21.3
𝐽' 𝑊;𝑋, 𝑦 = 𝐽 𝜃; 𝑋, 𝑦 + 𝛼Ω(𝑊)
𝑊3
𝑊2
𝑊A
𝑊B
𝑊C
𝑊D
Weights in output layer
𝑊J 𝑌
𝑊J
![Page 19: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/19.jpg)
CS109A, PROTOPAPAS, RADER
Sparse Representation
19
𝐽 𝜃; 𝑋, 𝑦
ℎA3
ℎA2ℎAA
4.34 = 3.2 2 1 2
−2.21.3
𝑊J 𝑌
ℎA3, ℎA2, ℎAA
![Page 20: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/20.jpg)
CS109A, PROTOPAPAS, RADER
Sparse Representation
20
𝐽' 𝑊;𝑋, 𝑦 = 𝐽 𝜃; 𝑋, 𝑦 + 𝛼Ω(ℎ)
Output of hidden layer
ℎA3
ℎA2ℎAA
1.3 = 3.2 2 1 0
−0.2.9
ℎA3, ℎA2, ℎAA
![Page 21: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/21.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation
§ Sparse Representation
§ Bagging§ Dropout
21
![Page 22: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/22.jpg)
CS109A, PROTOPAPAS, RADER 22
![Page 23: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/23.jpg)
CS109A, PROTOPAPAS, RADER
Outline
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation
§ Sparse Representation
§ Bagging
§ Dropout
23
![Page 24: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/24.jpg)
CS109A, PROTOPAPAS, RADER
Noise Robustness
Random perturbation of network weights
• Gaussian noise: Equivalent to minimizing loss with regularization term
• Encourages smooth function: small perturbation in weights leads to small changes in output
Injecting noise in output labels
• Better convergence: prevents pursuit of hard probabilities
24
![Page 25: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/25.jpg)
CS109A, PROTOPAPAS, RADER
Dropout
25
Train all sub-networks obtained by removing non-output units from base network
![Page 26: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/26.jpg)
CS109A, PROTOPAPAS, RADER
Dropout: Stochastic GD
For each new example/mini-batch:
• Randomly sample a binary mask μ independently, where μi indicatesifinput/hidden node i is included
• Multiply output of node i with μi, and perform gradient update
Typically, an input node is included with prob=0.8, hidden node with prob=0.5.
26
![Page 27: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/27.jpg)
CS109A, PROTOPAPAS, RADER
Dropout: Weight Scaling
During prediction time use all units, but scale weights with probability of inclusion
27
![Page 28: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/28.jpg)
CS109A, PROTOPAPAS, RADER
Adversarial Examples
28
![Page 29: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/29.jpg)
CS109A, PROTOPAPAS, RADER
Adversarial Examples
29
+ =
Training on adversarial examples is mostly intended to improve security, but can sometimes provide generic regularization.
Panda 57% confidence Gibbon 99.3% confidencenoise
![Page 30: Lecture 19: NN Regularization - GitHub Pages · 2019-08-29 · Lecture 19: NN Regularization. CS109A, PROTOPAPAS, RADER Outline Regularization of NN §Norm Penalties §Early Stopping](https://reader034.fdocuments.in/reader034/viewer/2022043002/5f7df7a59bb8fc1302268d40/html5/thumbnails/30.jpg)
CS109A, PROTOPAPAS, RADER
Recap
Regularization of NN § Norm Penalties
§ Early Stopping
§ Data Augmentation
§ Sparse Representation
§ Bagging
§ Dropout
30