Background of Neural Networks Deep Learning Conclusion · Kevin Patel Deep Learning 12/61....

MotivationBackground of Neural Networks

Deep LearningConclusion

Deep Learning

Kevin Patel

June 24, 2016

Kevin Patel Deep Learning 1 / 61



Outline

1 Motivation

2 Background of Neural NetworksPerceptronFeedforward Neural Networks

3 Deep LearningGreedy Layerwise Unsupervised Pretraining

4 Conclusion




Human Beings and Computer Science

What qualities of human beings make them good computerscientists?

ImpatienceLaziness






Impatience

Laziness






ImpatienceLaziness




Machine Translation




Information Retrieval




Sentiment Analysis






ImpatienceLaziness

Major reasons for the evolution of the field of ArtificialIntelligence




Know the limits

AI always fascinates people

Hyped by movies

Important to know what is actually possible

Also helps to decide where to push




Classification of AI systems

Two major categories based on working

Rule basedStatistical

Statistical systems are further classified based on their datausage

SupervisedUnsupervised

Statistical systems are also classified based on the problemsolved

ClassificationRegression...




A Refresher in Linear Algebra

Vector representations

Vector operations

Similarity between vectors




A Refresher in Optimization

Minima and Maxima

Local and Global Minima

Partial Differentiation from Calculus




PerceptronFeedforward Neural Networks

The Ultimate Computing Machine

The award for the most amazing computing machine goes to

Human Brain

Who gave it this award?

We, the researchers of AI did

How exactly?

By constantly trying to imitate it





Perceptron Algorithm

Given a set of input/label pairs (x1, y1), . . . , (xn, yn)

Learn a function to classify the problem

Learn a set of weights (w1, . . . ,wm) for the input feature

f (x) =

{1 if

∑mi=1 wixi > 0

0 otherwise






Activationfunction

∑x2

w2

...

xn

wn

x1 w1

x0

inputs weights






ProblemHand-crafted

Features

TrainableClassifier

Output

Input weights





Perceptron Example

The perceptron calculates, y =∑m

i=1 wi × xi + b

This is similar to y = m × x + c which is equation of a line in2d and hyperplane in general

Divide the input feature space into two regions, (positive andnegative class regions)





Training a Perceptron

Algorithm 2.1: Perceptron Algorithm(D)

comment: Initialize the feature weights to zero or random initialization

w = zeros()for each (x , y) ∈ D

do

comment: Calculate prediction

t = f (∑m

i=1 wixi )comment: Update the feature weights

if w = w + α(y − t)x





Training a Perceptron: Example

x1 x2 y

0 0 00 1 11 0 11 1 1

Train a perceptron for OR gate

Learn weights w = [w1,w2]






Initialize weights to zero, w1 = 0,w2 = 0

Input, x1 = 0, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 0) = f(0) = 0

w = w + α(y − t)x = w + (0− 0)x = w







Input, x1 = 0, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(0 × 0 + 0 × 1) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [0, 1]







Input, x1 = 1, x2 = 0

t = f( w1 × x1 + w2 × x2) = f(0 × 1 + 1 × 0) = f(0) = 0

w = w + α(y − t)x = w + (1− 0)x = w + x

w = [1, 1]







Input, x1 = 1, x2 = 1

t = f( w1 × x1 + w2 × x2) = f(1 × 1 + 1 × 1) = f(2) = 1

w = w + α(y − t)x = w + (1− 1)x = w






x1

x2

decision boundary red circles indicate 1white circles indicate 0





Disadvantages of Perceptron Algorithm

Cannot learn non-linear function

Famous XOR example






x1 x2 y

0 0 00 1 11 0 11 1 0

Train a perceptron to replicate XOR gate

Learn weights w1,w2






x1

x2

red circles indicate 1white circles indicate 0






A single perceptron cannot learn an XOR function

Need multiple perceptrons

What about hierarchy of perceptrons connected together?





Multilayer Perceptron for XOR Problem

y1

x2w2 = 1

x1 w1 = −1

y2

x2w3 = −1

x1 w4 = 1

y

w5 = 1

w6 = 1





Feedforward Neural Networks

Basically, a hierarchy perceptrons

With much more smoother activation functions, such as:

SigmoidTanhReLu...





Feedforward Neural Networks (contd.)

Inputlayer

Hiddenlayer

1

Hiddenlayer

2

Outputlayer





Feedforward Neural Network: Forward Propagation

Let X = (x1, . . . , xn) be the set of input features

hidden layer activation neurons,aj = f (

∑ni=1Wjixi ), ∀j ∈ 1, . . . h





Feedforward Neural Network: Forward Propagation

Let a = (a1, . . . , ah) be the set of hidden layer features

output neurons, ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K





Feedforward Neural Network: Learning Algorithm

Adjust weights W and U to minimize the error on training set

Define the error to be squared loss between predictions andtrue output

E =1

2Error2 =

1

2(y − o)2 (1)

Gradient w.r.t to output is,

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok) (2)





Feedforward Neural Network:: Learning Algorithm

We have the errors calculated at output neurons

Send the error to lower layers






Calculate gradient w.r.t to parameters U

∂E

∂ok=

1

2× 2× (yk − ok) = (yk − ok)

ok = g(∑h

j=1 Ukjaj), ∀k ∈ 1, . . .K

∂E

∂Ukj=∂E

∂ok× g ′(

h∑j=1

Ukjaj)× aj (3)

Update for Ukj will be,

Ukj = Ukj − η ×∂E

∂Ukj(4)






How to update the parameters W ?

aj = f (∑n

i=1Wjixi )

ok = g(∑h

j=1 Ukjaj)

Replacing for aj , ok = g(∑h

j=1 Ukj f (∑n

i=1Wjixi ))

Calculate gradient w.r.t aj






ok = g(∑h

j=1 Ukjaj)

We have calculated ∂E∂ok

∂E

∂aj=

K∑k=1

∂E

∂ok× g ′ × Ukj (5)





Feedforward Neural Network: Backpropagation of errors

Updation of parameters indicated by red lines






Errors are now accumulated at hidden layer neurons





Feedforward Neural Network:: Backpropagation of errors

We have calculated errors accumulated at each hiddenneuron, ∂E

∂aj

Use this to update the parametrs W

aj = f (∑n

i=1Wjixi ), ∀j ∈ 1, . . . h

∂E

∂Wji=∂E

∂oj× f ′(

n∑i=1

Wjixi )× xi (6)

Update for Wji will be,

Wji = Wji − η ×∂E

∂Wji(7)






Updation of parameters indicated by red lines





FeedForward Neural Network

The training proceeds in an online fashion

Minibatches are also used (i.e, parameters are updated afterseeing k examples )

Monitor the error on validation set after one complete sweepof training set

The training repeats until the error on validation set stops todecrease




Greedy Layerwise Unsupervised Pretraining

Problems with Feedforward Networks

Vanishing gradient

Stuck in local optimas





Problems with Feedforward Networks (contd.)

Inputlayer

Hidlayer

1

Hidlayer

2

Hidlayer

3

Hidlayer

4

Hidlayer

5

Hidlayer

6

Hidlayer

7

Hidlayer

8

Outputlayer





Problems with Feedforward Networks (contd.)

Can we instead start at a relatively good position?

Then even small updates will not be much of an issue

Increased number of parameters

Solution to both: Unsupervised Learning





Advantages of Unsupervised Learning

Unsupervised Learning has no labels

But, there’s lot of unsupervised data

Is the whole of Internet enough?

But, they have no labels

One option: use the input itself as labels





AutoEncoders

Two functions: Encoder fθ and Decoder gθ

Datapoint x ’s representation: h = fθ(x)

Reconstructed using decoder r = gθ(h)

Goal: Minimize reconstruction error





AutoEncoder Example





AutoEncoder Trivial Solution





Regularized AutoEncoders

Sparse AutoEncoders

Denoising AutoEncoders

Contractive AutoEncoders





Sparse AutoEncoder





Denoising AutoEncoder





Contractive AutoEncoder






The main reason for major success of Deep Learning

Definition:

Greedy layerwise: One layer trained according to local optimaUnsupervised: The intermediate layers do not need the finallabels for trainingPretraining: After this is done, another supervised trainingstep is applied on the entire network





Stacked AutoEncoders

Greedy Layerwise Unsupervised Pretraining in action

Inputlayer

Hidlayer

1

Hidlayer

2

Hidlayer

3

Outputlayer







Inputlayer

Hidlayer

1

Inputlayer







Hidlayer

1

Hidlayer

2

Hidlayer

1







Hidlayer

2

Hidlayer

3

Hidlayer

2





Advanced Architectures and Optimization Algorithms

Architectures

Convolutional Neural NetworksRecurrent Neural NetworksRecursive Neural Networks

Optimization Algorithms

RMSpropADAGradADADelta




Conclusion

Motivated AI in general

Discussed perceptron and feedforward neural networks

Understood shortcomings of normal attempts at deepnetworks

Understood Greedy Layerwise Unsupervised Pretraining




Thank You


Background of Neural Networks Deep Learning Conclusion · Kevin Patel Deep Learning 12/61....

Documents

Transcript of Background of Neural Networks Deep Learning Conclusion · Kevin Patel Deep Learning 12/61....