Background of Neural Networks Deep Learning Conclusion · Kevin Patel Deep Learning 12/61....
Transcript of Background of Neural Networks Deep Learning Conclusion · Kevin Patel Deep Learning 12/61....
MotivationBackground of Neural Networks
Deep LearningConclusion
Deep Learning
Kevin Patel
June 24, 2016
Kevin Patel Deep Learning 1 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Outline
1 Motivation
2 Background of Neural NetworksPerceptronFeedforward Neural Networks
3 Deep LearningGreedy Layerwise Unsupervised Pretraining
4 Conclusion
Kevin Patel Deep Learning 2 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Human Beings and Computer Science
What qualities of human beings make them good computerscientists?
ImpatienceLaziness
Kevin Patel Deep Learning 3 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Human Beings and Computer Science
What qualities of human beings make them good computerscientists?
Impatience
Laziness
Kevin Patel Deep Learning 3 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Human Beings and Computer Science
What qualities of human beings make them good computerscientists?
ImpatienceLaziness
Kevin Patel Deep Learning 3 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Machine Translation
Kevin Patel Deep Learning 4 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Information Retrieval
Kevin Patel Deep Learning 5 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Sentiment Analysis
Kevin Patel Deep Learning 6 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Human Beings and Computer Science
What qualities of human beings make them good computerscientists?
ImpatienceLaziness
Major reasons for the evolution of the field of ArtificialIntelligence
Kevin Patel Deep Learning 7 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Know the limits
AI always fascinates people
Hyped by movies
Important to know what is actually possible
Also helps to decide where to push
Kevin Patel Deep Learning 8 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Know the limits
AI always fascinates people
Hyped by movies
Important to know what is actually possible
Also helps to decide where to push
Kevin Patel Deep Learning 8 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Know the limits
AI always fascinates people
Hyped by movies
Important to know what is actually possible
Also helps to decide where to push
Kevin Patel Deep Learning 8 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Know the limits
AI always fascinates people
Hyped by movies
Important to know what is actually possible
Also helps to decide where to push
Kevin Patel Deep Learning 8 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Classification of AI systems
Two major categories based on working
Rule basedStatistical
Statistical systems are further classified based on their datausage
SupervisedUnsupervised
Statistical systems are also classified based on the problemsolved
ClassificationRegression...
Kevin Patel Deep Learning 9 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
A Refresher in Linear Algebra
Vector representations
Vector operations
Similarity between vectors
Kevin Patel Deep Learning 10 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
A Refresher in Optimization
Minima and Maxima
Local and Global Minima
Partial Differentiation from Calculus
Kevin Patel Deep Learning 11 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
The Ultimate Computing Machine
The award for the most amazing computing machine goes to
Human Brain
Who gave it this award?
We, the researchers of AI did
How exactly?
By constantly trying to imitate it
Kevin Patel Deep Learning 12 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Perceptron Algorithm
Given a set of input/label pairs (x1, y1), . . . , (xn, yn)
Learn a function to classify the problem
Learn a set of weights (w1, . . . ,wm) for the input feature
f (x) =
{1 if
∑mi=1 wixi > 0
0 otherwise
Kevin Patel Deep Learning 13 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Perceptron Algorithm
Activationfunction
∑x2
w2
...
xn
wn
x1 w1
x0
inputs weights
Kevin Patel Deep Learning 14 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Perceptron Algorithm
ProblemHand-crafted
Features
TrainableClassifier
Output
Input weights
Kevin Patel Deep Learning 15 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Perceptron Example
The perceptron calculates, y =∑m
i=1 wi × xi + b
This is similar to y = m × x + c which is equation of a line in2d and hyperplane in general
Divide the input feature space into two regions, (positive andnegative class regions)
Kevin Patel Deep Learning 16 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron
Algorithm 2.1: Perceptron Algorithm(D)
comment: Initialize the feature weights to zero or random initialization
w = zeros()for each (x , y) ∈ D
do
comment: Calculate prediction
t = f (∑m
i=1 wixi )comment: Update the feature weights
if w = w + α(y − t)x
Kevin Patel Deep Learning 17 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
x1 x2 y
0 0 00 1 11 0 11 1 1
Train a perceptron for OR gate
Learn weights w = [w1,w2]
Kevin Patel Deep Learning 18 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
Initialize weights to zero, w1 = 0,w2 = 0
Input, x1 = 0, x2 = 0
t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 0) = f(0) = 0
w = w + α(y − t)x = w + (0− 0)x = w
Kevin Patel Deep Learning 19 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
Initialize weights to zero, w1 = 0,w2 = 0
Input, x1 = 0, x2 = 1
t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 1) = f(0) = 0
w = w + α(y − t)x = w + (1− 0)x = w + x
w = [0, 1]
Kevin Patel Deep Learning 20 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
Initialize weights to zero, w1 = 0,w2 = 1
Input, x1 = 1, x2 = 0
t = f( w1 × x1 + w2 × x2) = f(0 × 1 + 1 × 0) = f(0) = 0
w = w + α(y − t)x = w + (1− 0)x = w + x
w = [1, 1]
Kevin Patel Deep Learning 21 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
Initialize weights to zero, w1 = 1,w2 = 1
Input, x1 = 1, x2 = 1
t = f( w1 × x1 + w2 × x2) = f(1 × 1 + 1 × 1) = f(2) = 1
w = w + α(y − t)x = w + (1− 1)x = w
Kevin Patel Deep Learning 22 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Training a Perceptron: Example
x1
x2
decision boundary red circles indicate 1white circles indicate 0
Kevin Patel Deep Learning 23 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Disadvantages of Perceptron Algorithm
Cannot learn non-linear function
Famous XOR example
Kevin Patel Deep Learning 24 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Disadvantages of Perceptron Algorithm
x1 x2 y
0 0 00 1 11 0 11 1 0
Train a perceptron to replicate XOR gate
Learn weights w1,w2
Kevin Patel Deep Learning 25 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Disadvantages of Perceptron Algorithm
x1
x2
red circles indicate 1white circles indicate 0
Kevin Patel Deep Learning 26 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Disadvantages of Perceptron Algorithm
A single perceptron cannot learn an XOR function
Need multiple perceptrons
What about hierarchy of perceptrons connected together?
Kevin Patel Deep Learning 27 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Multilayer Perceptron for XOR Problem
y1
x2w2 = 1
x1 w1 = −1
y2
x2w3 = −1
x1 w4 = 1
y
w5 = 1
w6 = 1
Kevin Patel Deep Learning 28 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Networks
Basically, a hierarchy perceptrons
With much more smoother activation functions, such as:
SigmoidTanhReLu...
Kevin Patel Deep Learning 29 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Networks (contd.)
Inputlayer
Hiddenlayer
1
Hiddenlayer
2
Outputlayer
Kevin Patel Deep Learning 30 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Forward Propagation
Let X = (x1, . . . , xn) be the set of input features
hidden layer activation neurons,aj = f (
∑ni=1Wjixi ), ∀j ∈ 1, . . . h
Kevin Patel Deep Learning 31 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Forward Propagation
Let a = (a1, . . . , ah) be the set of hidden layer features
output neurons, ok = g(∑h
j=1 Ukjaj), ∀k ∈ 1, . . .K
Kevin Patel Deep Learning 32 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Learning Algorithm
Adjust weights W and U to minimize the error on training set
Define the error to be squared loss between predictions andtrue output
E =1
2Error2 =
1
2(y − o)2 (1)
Gradient w.r.t to output is,
∂E
∂ok=
1
2× 2× (yk − ok) = (yk − ok) (2)
Kevin Patel Deep Learning 33 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network:: Learning Algorithm
We have the errors calculated at output neurons
Send the error to lower layers
Kevin Patel Deep Learning 34 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Learning Algorithm
Calculate gradient w.r.t to parameters U
∂E
∂ok=
1
2× 2× (yk − ok) = (yk − ok)
ok = g(∑h
j=1 Ukjaj), ∀k ∈ 1, . . .K
∂E
∂Ukj=∂E
∂ok× g ′(
h∑j=1
Ukjaj)× aj (3)
Update for Ukj will be,
Ukj = Ukj − η ×∂E
∂Ukj(4)
Kevin Patel Deep Learning 35 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Learning Algorithm
How to update the parameters W ?
aj = f (∑n
i=1Wjixi )
ok = g(∑h
j=1 Ukjaj)
Replacing for aj , ok = g(∑h
j=1 Ukj f (∑n
i=1Wjixi ))
Calculate gradient w.r.t aj
Kevin Patel Deep Learning 36 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Learning Algorithm
ok = g(∑h
j=1 Ukjaj)
We have calculated ∂E∂ok
∂E
∂aj=
K∑k=1
∂E
∂ok× g ′ × Ukj (5)
Kevin Patel Deep Learning 37 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Backpropagation of errors
Updation of parameters indicated by red lines
Kevin Patel Deep Learning 38 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Backpropagation of errors
Errors are now accumulated at hidden layer neurons
Kevin Patel Deep Learning 39 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network:: Backpropagation of errors
We have calculated errors accumulated at each hiddenneuron, ∂E
∂aj
Use this to update the parametrs W
aj = f (∑n
i=1Wjixi ), ∀j ∈ 1, . . . h
∂E
∂Wji=∂E
∂oj× f ′(
n∑i=1
Wjixi )× xi (6)
Update for Wji will be,
Wji = Wji − η ×∂E
∂Wji(7)
Kevin Patel Deep Learning 40 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
Feedforward Neural Network: Backpropagation of errors
Updation of parameters indicated by red lines
Kevin Patel Deep Learning 41 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
PerceptronFeedforward Neural Networks
FeedForward Neural Network
The training proceeds in an online fashion
Minibatches are also used (i.e, parameters are updated afterseeing k examples )
Monitor the error on validation set after one complete sweepof training set
The training repeats until the error on validation set stops todecrease
Kevin Patel Deep Learning 42 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Problems with Feedforward Networks
Vanishing gradient
Stuck in local optimas
Kevin Patel Deep Learning 43 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Problems with Feedforward Networks (contd.)
Inputlayer
Hidlayer
1
Hidlayer
2
Hidlayer
3
Hidlayer
4
Hidlayer
5
Hidlayer
6
Hidlayer
7
Hidlayer
8
Outputlayer
Kevin Patel Deep Learning 44 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Problems with Feedforward Networks (contd.)
Can we instead start at a relatively good position?
Then even small updates will not be much of an issue
Increased number of parameters
Solution to both: Unsupervised Learning
Kevin Patel Deep Learning 45 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Advantages of Unsupervised Learning
Unsupervised Learning has no labels
But, there’s lot of unsupervised data
Is the whole of Internet enough?
But, they have no labels
One option: use the input itself as labels
Kevin Patel Deep Learning 46 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
AutoEncoders
Two functions: Encoder fθ and Decoder gθ
Datapoint x ’s representation: h = fθ(x)
Reconstructed using decoder r = gθ(h)
Goal: Minimize reconstruction error
Kevin Patel Deep Learning 47 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
AutoEncoder Example
Kevin Patel Deep Learning 48 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
AutoEncoder Trivial Solution
Kevin Patel Deep Learning 49 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Regularized AutoEncoders
Sparse AutoEncoders
Denoising AutoEncoders
Contractive AutoEncoders
Kevin Patel Deep Learning 50 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Sparse AutoEncoder
Kevin Patel Deep Learning 51 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Denoising AutoEncoder
Kevin Patel Deep Learning 52 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Contractive AutoEncoder
Kevin Patel Deep Learning 53 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Greedy Layerwise Unsupervised Pretraining
The main reason for major success of Deep Learning
Definition:
Greedy layerwise: One layer trained according to local optimaUnsupervised: The intermediate layers do not need the finallabels for trainingPretraining: After this is done, another supervised trainingstep is applied on the entire network
Kevin Patel Deep Learning 54 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Stacked AutoEncoders
Greedy Layerwise Unsupervised Pretraining in action
Inputlayer
Hidlayer
1
Hidlayer
2
Hidlayer
3
Outputlayer
Kevin Patel Deep Learning 55 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Stacked AutoEncoders
Greedy Layerwise Unsupervised Pretraining in action
Inputlayer
Hidlayer
1
Inputlayer
Kevin Patel Deep Learning 56 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Stacked AutoEncoders
Greedy Layerwise Unsupervised Pretraining in action
Hidlayer
1
Hidlayer
2
Hidlayer
1
Kevin Patel Deep Learning 57 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Stacked AutoEncoders
Greedy Layerwise Unsupervised Pretraining in action
Hidlayer
2
Hidlayer
3
Hidlayer
2
Kevin Patel Deep Learning 58 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Greedy Layerwise Unsupervised Pretraining
Advanced Architectures and Optimization Algorithms
Architectures
Convolutional Neural NetworksRecurrent Neural NetworksRecursive Neural Networks
Optimization Algorithms
RMSpropADAGradADADelta
Kevin Patel Deep Learning 59 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Conclusion
Motivated AI in general
Discussed perceptron and feedforward neural networks
Understood shortcomings of normal attempts at deepnetworks
Understood Greedy Layerwise Unsupervised Pretraining
Kevin Patel Deep Learning 60 / 61
MotivationBackground of Neural Networks
Deep LearningConclusion
Thank You
Kevin Patel Deep Learning 61 / 61