Introduction to Machine Learning · Understanding machine learning: From theory to algorithms....
Transcript of Introduction to Machine Learning · Understanding machine learning: From theory to algorithms....
![Page 1: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/1.jpg)
Introduction to Machine LearningNeural Networks
Bhaskar Mukhoty, Shivam Bansal
Indian Institute of Technology KanpurSummer School 2019
June 4, 2019
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 1 / 30
![Page 2: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/2.jpg)
Lecture Outline
Neural Networks
Backpropagation Algorithm
Convolution NN
Recurrent NN
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 2 / 30
![Page 3: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/3.jpg)
Recap
Linear models: Learn a linear hypothesis function h in theinput/attribute space X .
Kernelized models: Map inputs φ(x) from attribute space X tofeature space F and learn a linear hypothesis function h in the featurespace.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 3 / 30
![Page 4: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/4.jpg)
Neural Networks
A neural network consists of an input layer, an output layer and oneor more hidden layers.
Each node in a hidden layer computes a nonlinear transform of inputsit receives.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 4 / 30
![Page 5: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/5.jpg)
Neural network with single hidden layer
Each input xn transformed intoseveral ”pre-activations” usinglinear models,
ank = wTk xn =
D∑d=1
wdkxnd
Non-linear activation appliedon each pre-activation,
hnk = g(ank)
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 5 / 30
![Page 6: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/6.jpg)
Neural network with single hidden layer
A linear model applied on thenew features hn,
sn = vThn =K∑
k=1
vkhnk
Finally, the output is producedas yn = o(sn).
The overall effect is anon-linear mapping frominputs to outputs.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 6 / 30
![Page 7: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/7.jpg)
Neural Network
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 7 / 30
![Page 8: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/8.jpg)
Fully-connected Feedforward Neural Network
Fully-connected: All pairs of nodes between adjacent layers areconnected to each other.
Feedforward: No backward connections. Also, only adjacent layernodes are connected.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 8 / 30
![Page 9: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/9.jpg)
Neural networks are feature learners
A NN tries to learn features that can predict the output well.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 9 / 30
![Page 10: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/10.jpg)
Neural Networks as Feature Learners
Figure: [Zeiler and Fergus, 2014]
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 10 / 30
![Page 11: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/11.jpg)
Learning Neural Networks via Backpropagation
Backpropogation is Gradient Descent using chain rule of derivatives.
Chain rule of derivatives: Example, if y = f1(x) and x = f2(z) then∂y∂z = ∂y
∂x∂x∂z .
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 11 / 30
![Page 12: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/12.jpg)
Learning Neural Networks via Backpropagation
Backpropagation iterates between a forward pass and a backwardpass.
Forward pass computes the errors using the current parameters.
Backward pass computes the gradients and updates the parameters,starting from the parameters at the top layer and then movingbackwards.
Using Backpropagation in neural nets enables us to reuse previouscomputations efficiently.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 12 / 30
![Page 13: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/13.jpg)
Activation Functions
Sigmoid: h = σ(a) = 11+exp(−a)
tanh(tan hyperbolic): h = exp(a)−exp(−a)exp(a)+exp(−a) = 2σ(2a)− 1
ReLU(Rectified Linear Unit): h = max(0, a)
Leaky ReLU: h = max(βa, a), where β is small positive number
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 13 / 30
![Page 14: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/14.jpg)
Activation Functions
Sigmoid, tanh can have issues with saturating gradients.
If weights are too large, the gradient for weights is close to zero andlearning becomes slow or may stop.
Pic credit: Andrej KarpathyBhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 14 / 30
![Page 15: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/15.jpg)
Activation Functions
ReLU activation function have dead ReLU problem.
If the weights are initialized such that output of node is 0, thegradient for weights is zero and the node never fires.
Pic credit: Andrej KarpathyBhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 15 / 30
![Page 16: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/16.jpg)
Preventing overfitting in Neural Networks
Weight decay: l1 or l2 regularization on the weights.
Early stopping: Stop when validation error starts increasing.
Dropout: Randomly remove units (with some probability p ∈ (0, 1))during training.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 16 / 30
![Page 17: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/17.jpg)
Convolution Neural Network
CNNs are feedforward neural networks.
Weights are shared among the connections.
The set of distinct weights defines a filter or local feature detector.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 17 / 30
![Page 18: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/18.jpg)
Convolution
An operation that captures spatially local patterns in the input.
Usually several filters {W k}Kk=1 are applied each producing a separatefeature map.
These filters are learned usign backpropagation.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 18 / 30
![Page 19: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/19.jpg)
Pooling
An operation that reduces the dimension of input.
Pooling operation is fixed before and not learned.
Popular pooling approaches: Max-pooling, average pooling
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 19 / 30
![Page 20: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/20.jpg)
Convolution Neural Network
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 20 / 30
![Page 21: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/21.jpg)
Modeling sequential data
Example of sequential data: Videos, text, speechFFNN on a single observation xn
FFNN on sequential data x1, ..., xT
For sequential data, we want dependencies between ht ’s of differentobservations.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 21 / 30
![Page 22: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/22.jpg)
Recurrent Neural Networks
A neural network for sequential data.
Each hidden state ht = f (Wxt + Uht−1) where U is a K × K matrixand f some activation function.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 22 / 30
![Page 23: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/23.jpg)
Different types of RNN
Both input and output can be sequences of different lengths.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 23 / 30
![Page 24: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/24.jpg)
Backpropagation through time
Think of the time-dimension as another hidden layer and then it isjust like standard backpropagation for feedforward neural nets.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 24 / 30
![Page 25: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/25.jpg)
RNN Limitation
Vanishing or exploding gradients: Repeated multiplication can causegradients to vanish or explode.
Weak Long-term dependency: Repeated composition of functionscause the sensitivity of hidden states on a given part of input tobecome weaker as we move along the sequence.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 25 / 30
![Page 26: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/26.jpg)
Long Short-Term Memory
An RNN with hidden nodes having gates to remember or forgetinformation.
Open gate denoted by ’o’ and closed gate denoted by ’-’.
Minor variations of LSTM exists depending on gates used, eg. GRU.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 26 / 30
![Page 27: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/27.jpg)
Gated Recurrent Unit (Simplified)
RNN computes hidden states as
ht = tanh (Wxt + Uht−1)
.
For RNN state update is multiplicative (weak memory and gradientissues).
GRU computes hidden states as
h̃t = tanh (Wxt + Uht−1)
Γu = σ(Pxt + Qht−1)
ht = Γu × h̃t + (1− Γu)× ht−1
For GRU state update is additive.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 27 / 30
![Page 28: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/28.jpg)
Questions?
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 28 / 30
![Page 29: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/29.jpg)
References I
Andrew Ng (2019).Sequence models.https://www.coursera.org/learn/nlp-sequence-models.
Carter, S. (2019).Visualize feed-forward neural network.https://playground.tensorflow.org/.
Kar, P. (2017).Introduction to machine learning.https://web.cse.iitk.ac.in/users/purushot/courses/ml/
2017-18-a.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 29 / 30
![Page 30: Introduction to Machine Learning · Understanding machine learning: From theory to algorithms. Cambridge university press. Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding](https://reader033.fdocuments.in/reader033/viewer/2022043021/5f3cac009d77ad0bdb53d38f/html5/thumbnails/30.jpg)
References II
Rai, P. (2018).Introduction to machine learning.https://www.cse.iitk.ac.in/users/piyush/courses/ml_
autumn18/index.html.
Shalev-Shwartz, S. and Ben-David, S. (2014).Understanding machine learning: From theory to algorithms.Cambridge university press.
Zeiler, M. D. and Fergus, R. (2014).Visualizing and understanding convolutional networks.In European conference on computer vision, pages 818–833. Springer.
Bhaskar Mukhoty, Shivam Bansal ( Indian Institute of Technology Kanpur Summer School 2019 )Introduction to Machine Learning June 4, 2019 30 / 30