Deep Learning - NTU Speech Processing...
Transcript of Deep Learning - NTU Speech Processing...
![Page 1: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/1.jpg)
Deep LearningHung-yi Lee
李宏毅
![Page 2: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/2.jpg)
Deep learning attracts lots of attention.• I believe you have seen lots of exciting results
before.
Deep learning trends at Google. Source: SIGMOD 2016/Jeff Dean
![Page 3: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/3.jpg)
• 1958: Perceptron (linear model)
• 1969: Perceptron has limitation
• 1980s: Multi-layer perceptron
• Do not have significant difference from DNN today
• 1986: Backpropagation
• Usually more than 3 hidden layers is not helpful
• 1989: 1 hidden layer is “good enough”, why deep?
• 2006: RBM initialization
• 2009: GPU
• 2011: Start to be popular in speech recognition
• 2012: win ILSVRC image competition
• 2015.2: Image recognition surpassing human-level performance
• 2016.3: Alpha GO beats Lee Sedol
• 2016.10: Speech recognition system as good as humans
Ups and downs of Deep Learning
![Page 4: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/4.jpg)
Step 1: define a set of function
Step 2: goodness of
function
Step 3: pick the best function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
![Page 5: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/5.jpg)
Neural Network
z
z
z
z
“Neuron”
Different connection leads to different network structures
Neural Network
Network parameter 𝜃: all the weights and biases in the “neurons”
![Page 6: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/6.jpg)
Fully Connect Feedforward Network
z
z
ze
z
1
1
Sigmoid Function
1
-1
1
-2
1
-1
1
0
4
-2
0.98
0.12
![Page 7: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/7.jpg)
Fully Connect Feedforward Network
1
-2
1
-1
1
0
4
-2
0.98
0.12
2
-1
-1
-2
3
-1
4
-1
0.86
0.11
0.62
0.83
0
0
-2
2
1
-1
![Page 8: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/8.jpg)
Fully Connect Feedforward Network
1
-2
1
-1
1
0
0.73
0.5
2
-1
-1
-2
3
-1
4
-1
0.72
0.12
0.51
0.85
0
0
-2
2
𝑓00
=0.510.85
𝑓1−1
=0.620.83
0
0
This is a function.
Input vector, output vector
Given network structure, define a function set
![Page 9: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/9.jpg)
Output LayerHidden Layers
Input Layer
Fully Connect Feedforward Network
Input Output
1x
2x
Layer 1
……
Nx
……
Layer 2
……
Layer L…
…
……
……
……
……
y1
y2
yM
neuron
![Page 10: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/10.jpg)
8 layers
19 layers
22 layers
AlexNet (2012) VGG (2014) GoogleNet (2014)
16.4%
7.3%6.7%
http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf
Deep = Many hidden layers
![Page 11: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/11.jpg)
AlexNet(2012)
VGG (2014)
GoogleNet(2014)
152 layers
3.57%
Residual Net (2015)
Taipei101
101 layers
16.4%7.3% 6.7%
Deep = Many hidden layers
Special structure
Ref: https://www.youtube.com/watch?v=dxB6299gpvI
![Page 12: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/12.jpg)
𝜎
Matrix Operation
2y
1y1
-2
1
-1
1
0
4
-2
0.98
0.12
1−1
1 −2−1 1
+10
0.980.12
=
1
-1
4−2
![Page 13: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/13.jpg)
1x
2x
……
Nx
……
……
……
……
……
……
……
y1
y2
yM
Neural Network
W1 W2 WL
b2 bL
x a1 a2 y
b1W1 x +𝜎b2W2 a1 +𝜎
bLWL +𝜎 aL-1
b1
![Page 14: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/14.jpg)
= 𝜎 𝜎
1x
2x
……
Nx
……
……
……
……
……
……
……
y1
y2
yM
Neural Network
W1 W2 WL
b2 bL
x a1 a2 y
y = 𝑓 x
b1W1 x +𝜎 b2W2 + bLWL +…
b1
…
Using parallel computing techniques to speed up matrix operation
![Page 15: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/15.jpg)
Output Layer as Multi-Class Classifier
……
……
……
……
……
……
……
……
y1
y2
yMKx
Output LayerHidden Layers
Input Layer
x
1x
2x
Feature extractor replacing feature engineering
= Multi-class Classifier
Softm
ax
![Page 16: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/16.jpg)
Example Application
Input Output
16 x 16 = 256
1x
2x
256x…
…
Ink → 1No ink → 0
……
y1
y2
y10
Each dimension represents the confidence of a digit.
is 1
is 2
is 0
……
0.1
0.7
0.2
The image is “2”
![Page 17: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/17.jpg)
Example Application
• Handwriting Digit Recognition
Machine “2”
1x
2x
256x
…… ……
y1
y2
y10
is 1
is 2
is 0
……
What is needed is a function ……
Input: 256-dim vector
output: 10-dim vector
NeuralNetwork
![Page 18: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/18.jpg)
Output LayerHidden Layers
Input Layer
Example Application
Input Output
1x
2x
Layer 1
……
Nx
……
Layer 2
……
Layer L
……
……
……
……
“2”……
y1
y2
y10
is 1
is 2
is 0
……
A function set containing the candidates for
Handwriting Digit Recognition
You need to decide the network structure to let a good function in your function set.
![Page 19: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/19.jpg)
FAQ
• Q: How many layers? How many neurons for each layer?
• Q: Can the structure be automatically determined?• E.g. Evolutionary Artificial Neural Networks
• Q: Can we design the network structure?
Trial and Error Intuition+
Convolutional Neural Network (CNN)
![Page 20: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/20.jpg)
Step 1: define a set of function
Step 2: goodness of
function
Step 3: pick the best function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
![Page 21: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/21.jpg)
Loss for an Example
1x
2x
……
256x
……
……
……
……
……
y1
y2
y10
CrossEntropy
“1”
……
1
0
0
……
target
Softm
ax
𝑙 𝑦 , ො𝑦 = −
𝑖=1
10
ෝ𝑦𝑖𝑙𝑛𝑦𝑖
ො𝑦1
ො𝑦2
ො𝑦10
……
Given a set of parameters
𝑦 ො𝑦
![Page 22: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/22.jpg)
Total Loss
x1
x2
xN
NN
NN
NN
……
……
y1
y2
yN
ො𝑦1
ො𝑦2
ො𝑦𝑁
𝑙1
……
……
x3 NN y3 ො𝑦3
For all training data …𝐿 =
𝑛=1
𝑁
𝑙𝑛
Find the network parameters 𝜽∗ that minimize total loss L
Total Loss:
𝑙2
𝑙3
𝑙𝑁
Find a function in function set that minimizes total loss L
![Page 23: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/23.jpg)
Step 1: define a set of function
Step 2: goodness of
function
Step 3: pick the best function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
![Page 24: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/24.jpg)
Gradient Descent
𝑤1
Compute Τ𝜕𝐿 𝜕𝑤1
−𝜇 Τ𝜕𝐿 𝜕𝑤1
0.15
𝑤2
Compute Τ𝜕𝐿 𝜕𝑤2
−𝜇 Τ𝜕𝐿 𝜕𝑤2
0.05
𝑏1
Compute Τ𝜕𝐿 𝜕𝑏1
−𝜇 Τ𝜕𝐿 𝜕𝑏10.2
……
……
0.2
-0.1
0.3
𝜃𝜕𝐿
𝜕𝑤1
𝜕𝐿
𝜕𝑤2
⋮𝜕𝐿
𝜕𝑏1⋮
𝛻𝐿 =
gradient
![Page 25: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/25.jpg)
Gradient Descent
𝑤1
Compute Τ𝜕𝐿 𝜕𝑤1
−𝜇 Τ𝜕𝐿 𝜕𝑤1
0.15−𝜇 Τ𝜕𝐿 𝜕𝑤1
Compute Τ𝜕𝐿 𝜕𝑤1
0.09
𝑤2
Compute Τ𝜕𝐿 𝜕𝑤2
−𝜇 Τ𝜕𝐿 𝜕𝑤2
0.05−𝜇 Τ𝜕𝐿 𝜕𝑤2
Compute Τ𝜕𝐿 𝜕𝑤2
0.15
𝑏1
Compute Τ𝜕𝐿 𝜕𝑏1
−𝜇 Τ𝜕𝐿 𝜕𝑏10.2
−𝜇 Τ𝜕𝐿 𝜕𝑏1
Compute Τ𝜕𝐿 𝜕𝑏10.10
……
……
0.2
-0.1
0.3
……
……
……
𝜃
![Page 26: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/26.jpg)
Gradient Descent
This is the “learning” of machines in deep learning ……
Even alpha go using this approach.
I hope you are not too disappointed :p
People image …… Actually …..
![Page 27: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/27.jpg)
Backpropagation
• Backpropagation: an efficient way to compute Τ𝜕𝐿 𝜕𝑤 in neural network
libdnn台大周伯威同學開發
Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html
![Page 28: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/28.jpg)
Step 1: define a set of function
Step 2: goodness of
function
Step 3: pick the best function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
![Page 29: Deep Learning - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/DL.pdf · •2016.10: Speech recognition system as good as humans Ups and downs](https://reader035.fdocuments.in/reader035/viewer/2022080507/5f7be9b5afb29749aa5ad3ca/html5/thumbnails/29.jpg)
Acknowledgment
•感謝 Victor Chen發現投影片上的打字錯誤