Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward...
Transcript of Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward...
![Page 1: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/1.jpg)
Machine Learning using Matlab
Lecture 6 Neural Network (cont.)
![Page 2: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/2.jpg)
Cost function
![Page 3: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/3.jpg)
Forward propagation● Forward propagation from layer l to layer l+1 is computed as:
● Note when l = 1,
Layer 1 Layer 2 Layer 3 Layer 4
![Page 4: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/4.jpg)
Backpropagation● Backpropagation from layer l+1 to layer l is computed as:
● When l = L,
Layer 1 Layer 2 Layer 3 Layer 4
![Page 5: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/5.jpg)
Example
Layer 1 Layer 2 Layer 3 Layer 4
Forward propagation Backpropagation
Given a training example (x,y), the cost function is first simplified as: . Forward propagation and backpropagation are computed as:
![Page 6: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/6.jpg)
Gradient computation 1. Given training set 2. Set 3. For i = 1 to m
○ Set ○ Perform forward propagation to compute al for ○ Using yi , compute○ Compute○ delta
4.
![Page 7: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/7.jpg)
Random initialization● Instead of initialize the parameters to all zeros, it is important to initialize them
randomly.● The random initialization serves the purpose of symmetry breaking.
Layer 1 Layer 2 Layer 3 Layer 4
Forward propagation Backpropagation
![Page 8: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/8.jpg)
Random initialization - Matlab function● Initial each parameter to a random value in ● function W = randInitializeWeights(L_in, L_out)
epsilon_init = 0.1 W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init end
![Page 9: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/9.jpg)
Advanced optimization● We have taught how to call the existing numerical computing functions to
acquire the optimal parameters:○ Function [J, grad] = costFunction(theta) ...○ optTheta = minFunc(@costFunction, initialTheta, options)
● In the following neural network, we have three parameter matrices, how to feed them into “minFunc” function?
Layer 1 Layer 2 Layer 3 Layer 4
“Unroll” into vectors
![Page 10: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/10.jpg)
Advanced optimization - exampleL = 4, s1= 4, s2 = 5, s3 = 5, s4 = 4
ϴ(1) ∈ ℝ5×5, ϴ(2) ∈ ℝ5×6, ϴ(3) ∈ ℝ4×6
Matlab implementation:
1. Unroll: thetaVec = [ Theta1(:); Theta2(:); Theta3(:)]2. Feed thetaVec into “minFunc”3. Reshape thetaVec in “costFunction”:
a. Theta1 = reshape(thetaVec(1:25),5,5);b. Theta2 = reshape(thetaVect(26:55),5,6);c. Theta3 = reshape(thetaVect(56:79),4,6);d. Compute J and grad
Layer 1 Layer 2 Layer 3 Layer 4
![Page 11: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/11.jpg)
Gradient check● Too many parameters, not sure if the computed gradient is correct or not?!● Recall the definition of numerical estimation of gradients, we can compare the
gradient with the numerical estimation of gradients.
![Page 12: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/12.jpg)
Gradient check
![Page 13: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/13.jpg)
Gradient check● Implementation note:
○ Implement backpropagation to compute gradient○ Implement numerical gradient check to compute estimated gradient○ Make sure they have similar values (less than a threshold)○ Turn off gradient check for training
● Note:○ Be sure to disable your gradient check code, otherwise it is very slow to learn○ Gradient check can be generalized to check gradient of any cost function
![Page 14: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/14.jpg)
Overview: train a neural network● Design a network architecture● Randomly initialized weights● Implement forward propagation to get hypothesis for any xi ● Implement code to compute cost function Jthe● Implement backpropagation to compute partial derivatives● Use gradient check to compare with numerical estimation of gradient
of Jthet, If it works well, then disable gradient checking code● Use gradient descent or advanced optimization method to minimize Jthet as a
function of parameters T
![Page 15: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/15.jpg)
![Page 16: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/16.jpg)
Deep feedforward Neural Networks
![Page 17: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/17.jpg)
Other architectures
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
![Page 20: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/20.jpg)
Discussion● More parameters, more powerful.
○ Which one is better: more layers or more neurons?○ Disadvantages?
● Neural network is non-convex, gradient descent is susceptible to local optima; however, it works fairly well even though the optima is not global.
● Black box model
![Page 21: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/21.jpg)
From logistic regression to SVMLogistic regression
● Label:
● Hypothesis:
● Objective:
Support Vector Machine (SVM)
● Label:
● Hypothesis:
● Objective:
![Page 22: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/22.jpg)
From logistic regression to SVM● Cost function: ● Cost function:
![Page 23: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/23.jpg)
From logistic regression to SVM● Logistic regression:
● SVM:
![Page 24: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/24.jpg)
SVM - model representation● Given training examples , SVM aims to find an optimal hyperplane
so that:
● Which is equivalent to minimizing the following cost function:
● Here is called hinge loss.
![Page 25: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/25.jpg)
SVM - gradient computing● Because the hinge loss is not differentiable, a sub-gradient is computed:
![Page 26: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/26.jpg)
SVM - intuition
![Page 27: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/27.jpg)
● Which of the linear classifier is optimal?
SVM - intuition
![Page 28: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/28.jpg)
SVM - intuition
1. Maximizing the margin is good according to tuition and PAC theory
2. Implies that only support vectors are important; while other training examples are ignorable
Support vector
Support vector
![Page 29: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/29.jpg)
SVM - intuition● Which linear classifier has better performance?
![Page 30: Machine Learning using Matlab - Uni Konstanz...Example Layer 1 Layer 2 Layer 3 Layer 4 Forward propagation Backpropagation Given a training example (x,y), the cost function is first](https://reader030.fdocuments.in/reader030/viewer/2022040402/5e813a9477e8387a6340045b/html5/thumbnails/30.jpg)