Neural Networks - University of California, Irvine
Transcript of Neural Networks - University of California, Irvine
![Page 1: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/1.jpg)
Neural Networks
PROF XIAOHUI XIESPRING 2019
CS 273P Machine Learning and Data Mining
Slides courtesy of Alex Ihler
![Page 2: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/2.jpg)
Machine Learning
Multi-Layer Perceptrons
Backpropagation Learning
Convolutional Neural Networks
![Page 3: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/3.jpg)
⢠Linear Classifiersâ a linear classifier is a mapping which partitions feature space using a linear
function (a straight line, or a hyperplane)â separates the two classes using a straight line in feature spaceâ in 2 dimensions the decision boundary is a straight line
Linearly separable data Linearly non-separable data
Feature 1, x1
Feat
ure
2, x
2
Decision boundary
Feature 1, x1
Feat
ure
2, x
2 Decision boundary
Linear classifiers (perceptrons)
![Page 4: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/4.jpg)
Perceptron Classifier (2 features)
Âľ1
Âľ2
Âľ0
{-1, +1}
weighted sum of the inputsThreshold Function
output = class decision
T(r)r
Classifierx1x2
1
T(r)r = Âľ1 x1 + Âľ2 x2 + Âľ0
âlinear responseâ
r = X.dot( theta.T ) # compute linear responseYhat = 2*(r > 0)-1 # âsignâ: predict +1 / -1
or, {0, 1}
Decision Boundary at r(x) = 0
Solve: X2 = -w1/w2 X1 â w0/w2 (Line)
![Page 5: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/5.jpg)
Perceptron Classifier (2 features)
Âľ1
Âľ2
Âľ0
{-1, +1}
weighted sum of the inputsThreshold Function
output = class decision
T(r)r
Classifierx1x2
1
T(r)r = Âľ1 x1 + Âľ2 x2 + Âľ0
âlinear responseâ
r = X.dot( theta.T ) # compute linear responseYhat = 2*(r > 0)-1 # âsignâ: predict +1 / -1
or, {0, 1}
Decision boundary = âx such that T( w1 x + w0 ) transitionsâ
1D example: T(r) = -1 if r < 0T(r) = +1 if r > 0
![Page 6: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/6.jpg)
⢠Recall the role of featuresâ We can create extra features that allow more complex decision
boundariesâ Linear classifiersâ Features [1,x]
⢠Decision rule: T(ax+b) = ax + b >/< 0⢠Boundary ax+b =0 => point
â Features [1,x,x2]⢠Decision rule T(ax2+bx+c) ⢠Boundary ax2+bx+c = 0 = ?
â What features can produce this decision rule?
Features and perceptrons
![Page 7: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/7.jpg)
⢠Recall the role of featuresâ We can create extra features that allow more complex decision
boundariesâ For example, polynomial features
ÎŚ(x) = [1 x x2 x3 âŚ]
⢠What other kinds of features could we choose?â Step functions?
F1
F2
F3
Linear function of features a F1 + b F2 + c F3 + d
Ex: F1 â F2 + F3
Features and perceptrons
![Page 8: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/8.jpg)
⢠Step functions are just perceptrons!â âFeaturesâ are outputs of a perceptronâ Combination of features output of another
F1
Linear function of features: a F1 + b F2 + c F3 + d
Ex: F1 â F2 + F3
w11
w10x1
1
âF2
âF3
â
w21
w20
w31
w30
Out
âw3
w1
w2
âHidden layerâ
âOutput layerâ
w10 w11W1 = w20 w21 w30 w31
W2 = w1 w2 w3
Multi-layer perceptron model
![Page 9: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/9.jpg)
⢠Step functions are just perceptrons!â âFeaturesâ are outputs of a perceptronâ Combination of features output of another
F1
Linear function of features: a F1 + b F2 + c F3 + d
Ex: F1 â F2 + F3
w11
w10x1
1
âF2
âF3
â
w21
w20
w31
w30
Out
âw3
w1
w2
âHidden layerâ
âOutput layerâ
w10 w11W1 = w20 w21 w30 w31
Regression version:Remove activation function from output
W2 = w1 w2 w3
Multi-layer perceptron model
![Page 10: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/10.jpg)
TODO⢠Block layers & color somehow
⢠Discuss âfully connectedâ
⢠Simplified diagram
![Page 11: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/11.jpg)
⢠Simple building blocksâ Each element is just a perceptron function
⢠Can build upwards
Input Features
Perceptron: Step function / Linear partition
Features of MLPs
![Page 12: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/12.jpg)
⢠Simple building blocksâ Each element is just a perceptron function
⢠Can build upwards
Input Features
2-layer: âFeaturesâ are now partitions All linear combinations of those partitions
Layer 1
Features of MLPs
![Page 13: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/13.jpg)
⢠Simple building blocksâ Each element is just a perceptron function
⢠Can build upwards
Input Features
3-layer: âFeaturesâ are now complex functions Output any linear combination of those
Layer 1 Layer 2
Features of MLPs
![Page 14: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/14.jpg)
⢠Simple building blocksâ Each element is just a perceptron function
⢠Can build upwards
Input Features
Current research: âDeepâ architectures (many layers)
Layer 1 Layer 2
âŚ
âŚLayer 3
Features of MLPs
![Page 15: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/15.jpg)
⢠Simple building blocksâ Each element is just a perceptron function
⢠Can build upwards
⢠Flexible function approximationâ Approximate arbitrary functions with enough hidden nodes
âŚ
Input Features
Layer 1
Output
âŚ
h1
h2
h1 h2 h3
y
x0 x1âŚ
v0v1
Features of MLPs
![Page 16: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/16.jpg)
⢠Another term for MLPs⢠Biological motivation
⢠Neuronsâ âSimpleâ cellsâ Dendrites sense chargeâ Cell weighs inputsâ âFiresâ axon
âw3
w1
w2
âHow stuff works: the brainâ
Neural networks
![Page 17: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/17.jpg)
Logistic
Hyperbolic Tangent
Gaussian
ReLU(rectified linear)
and many othersâŚ
Activation functions
Linear
![Page 18: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/18.jpg)
Feed-forward networks⢠Information flows left-to-right
â Input observed featuresâ Compute hidden nodes (parallel)â Compute next layerâŚ
R = X.dot(W[0])+B[0] # linear responseH1= Sig( R ) # activation fân
S = H1.dot(W[1])+B[1] # linear responseH2 = Sig( S ) # activation fân
X
W[0]H1
W[1]
H2
Information
![Page 19: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/19.jpg)
Feed-forward networks⢠Information flows left-to-right
â Input observed featuresâ Compute hidden nodes (parallel)â Compute next layerâŚ
⢠Alternative: recurrent NNsâŚ
X1 = _add1(X); # add constant featureT = X1.dot(W[0].T); # linear responseH = Sig( T ); # activation fân
H1 = _add1(H); # add constant featureS = H1.dot(W[1].T); # linear responseH2 = Sig( S ); # activation fân
% ...
X
W[0]H1
W[1]
H2
Information
![Page 20: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/20.jpg)
Feed-forward networksA note on multiple outputs:
â˘Regression:â Predict multi-dimensional yâ âSharedâ representation
= fewer parameters
â˘Classificationâ Predict binary vectorâ Multi-class classification
y = 2 = [0 0 1 0 ⌠]â Multiple, joint binary predictions
(image tagging, etc.)
â Often trained as regression (MSE),with saturating activation
Information
![Page 21: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/21.jpg)
Machine Learning
Backpropagation Learning
Multi-Layer Perceptrons
Convolutional Neural Networks
![Page 22: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/22.jpg)
⢠Observe features âxâ with target âyâ⢠Push âxâ through NN = output is âšâ⢠Error: (y- š)2
⢠How should we update the weights to improve?
⢠Single layerâ Logistic sigmoid functionâ Smooth, differentiable
⢠Optimize using:â Batch gradient descentâ Stochastic gradient descent
Inputs
Hidden Layer
Outputs
(Can use different loss functions if desiredâŚ)
Training MLPs
![Page 23: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/23.jpg)
Gradient calculations⢠Think of NNs as âschematicsâ made of smaller functions
â Building blocks: summations & nonlinearities
â For derivatives, just apply the chain rule, etc!
Inputs
Hidden Layer
Outputs
âŚ
Ex: f(g,h) = g2 h
save & reuse info (g,h) from forward computation!
![Page 24: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/24.jpg)
⢠Just gradient descentâŚâ˘ Apply the chain rule to the MLP
Forward pass
Output layer
Hidden layer
Loss function
(Identical to logistic mse regression with inputs âhjâ)
šk
hj
Backpropagation
![Page 25: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/25.jpg)
⢠Just gradient descentâŚâ˘ Apply the chain rule to the MLP
Forward pass
Output layer
Hidden layer
Loss function
šk
hj
xi
Backpropagation
(Identical to logistic mse regression with inputs âhjâ)
![Page 26: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/26.jpg)
⢠Just gradient descentâŚâ˘ Apply the chain rule to the MLP
Forward pass
Output layer
Hidden layer
Loss function
B2 = (Y-Yhat) * dSig(S) #(1xN3)
G2 = B2.T.dot( H ) #(N3x1)*(1xN2)=(N3xN2)
B1 = B2.dot(W[1])*dSig(T)#(1xN3)*(N3xN2)=(1xN2)
G1 = B1.T.dot( X ) #(N2xN1)
# X : (1xN1) # W1 : (N2xN1)H = Sig(X.dot(W[0])) # H : (1xN2)# W2 : (N3xN21)Yh = Sig(H.dot(W[1])) # Yh : (1xN3)
Backpropagation
![Page 27: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/27.jpg)
Example: Regression, MCycle data⢠Train NN model, 2 layer
â 1 input features => 1 input units
â 10 hidden units
â 1 target => 1 output units
â Logistic sigmoid activation for hidden layer, linear for output layer
Data: +
learned prediction fân:
Responses of hidden nodes(= features of linear regression):select out useful regions of âxâ
![Page 28: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/28.jpg)
Example: Classification, Iris data⢠Train NN model, 2 layer
â 2 input features => 2 input units
â 10 hidden units
â 3 classes => 3 output units (y = [0 0 1], etc.)
â Logistic sigmoid activation functions
â Optimize MSE of predictions using stochastic gradient
![Page 29: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/29.jpg)
Demo Time!
http://playground.tensorflow.org/
![Page 30: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/30.jpg)
MLPs in practice⢠Example: Deep belief nets
â Handwriting recognitionâ Online demoâ 784 pixels ⏠500 mid ⏠500 high ⏠2000 top ⏠10 labels
h1
h2
h3
š
x
h1 h2 h3 šx
[Hinton et al. 2007]
![Page 31: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/31.jpg)
MLPs in practice⢠Example: Deep belief nets
â Handwriting recognitionâ Online demoâ 784 pixels ⏠500 mid ⏠500 high ⏠2000 top ⏠10 labels
h1
h2
h3
š
x
h1 h2 h3 šx
[Hinton et al. 2007]
![Page 32: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/32.jpg)
MLPs in practice⢠Example: Deep belief nets
â Handwriting recognitionâ Online demoâ 784 pixels ⏠500 mid ⏠500 high ⏠2000 top ⏠10 labels
(c) Alexander Ihler
Fix output, simulate inputs
[Hinton et al. 2007]
![Page 33: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/33.jpg)
Machine Learning
Convolutional Neural Networks
Multi-Layer Perceptrons
Backpropagation Learning
![Page 34: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/34.jpg)
Convolutional networks⢠Organize & share the NNâs weights (vs âdenseâ)
⢠Group weights into âfiltersâ
Input: 28x28 image Weights: 5x5
![Page 35: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/35.jpg)
Convolutional networks⢠Organize & share the NNâs weights (vs âdenseâ)
⢠Group weights into âfiltersâ & convolve across input image
Input: 28x28 image Weights: 5x5
filter response at each patch
Run over all patches of input ) activation map
24x24 image
![Page 36: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/36.jpg)
Convolutional networks⢠Organize & share the NNâs weights (vs âdenseâ)
⢠Group weights into âfiltersâ & convolve across input image
Input: 28x28 image Weights: 5x5
Another filter
Run over all patches of input ) activation map
![Page 37: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/37.jpg)
Convolutional networks⢠Organize & share the NNâs weights (vs âdenseâ)
⢠Group weights into âfiltersâ & convolve across input image
⢠Many hidden nodes, but few parameters!
Input: 28x28 image Weights: 5x5 Hidden layer 1
![Page 38: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/38.jpg)
Convolutional networks⢠Again, can view components as building blocks
⢠Design overall, deep structure from partsâ Convolutional layers
â âMax-poolingâ (sub-sampling) layers
â Densely connected layers
LeNet-5 [LeCun 1980]
![Page 39: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/39.jpg)
Ex: AlexNet⢠Deep NN model for ImageNet classification
â 650k units; 60m parameters
â 1m data; 1 week training (GPUs)
Convolutional Layers (5) Dense Layers (3)
Output(1000 classes)Input
224x224x3
[Krizhevsky et al. 2012]
![Page 40: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/40.jpg)
Hidden layers as âfeaturesâ⢠Visualizing a convolutional networkâs filters
Slide image from Yann LeCun:https://drive.google.com/open?id=0BxKBnD5y2M8NclFWSXNxa0JlZTg
[Zeiler & Fergus 2013]
![Page 41: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/41.jpg)
Dropout⢠Another recent technique
â Randomly âblockâ some neurons at each step
â Trains model to have redundancy (predictions must be robust to blocking)
Inputs
Hidden Layers
Output
Inputs
Hidden Layers
Output
Each training prediction:sample neurons to remove
[Srivastava et al 2014]
# ... during training ...R = X.dot(W[0])+B[0]; # linear responseH1= Sig( R ); # activation fânH1 *= np.random.rand(*H1.shape)<p; #drop out!
![Page 42: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/42.jpg)
Neural networks & DBNs⢠Want to try them out?⢠Matlab âDeep Learning Toolboxâ
https://github.com/rasmusbergpalm/DeepLearnToolbox
⢠PyLearn2https://github.com/lisa-lab/pylearn2
⢠TensorFlow
(c) Alexander Ihler
![Page 43: Neural Networks - University of California, Irvine](https://reader036.fdocuments.in/reader036/viewer/2022071613/6157328447b026326a129b6c/html5/thumbnails/43.jpg)
Summary⢠Neural networks, multi-layer perceptrons
⢠Cascade of simple perceptronsâ Each just a linear classifierâ Hidden units used to create new features
⢠Together, general function approximatorsâ Enough hidden units (features) = any functionâ Can create nonlinear classifiersâ Also used for function approximation, regression, âŚ
⢠Training via backpropâ Gradient descent; logistic; apply chain rule. Building block view.
⢠Advanced: deep nets, conv nets, dropout, âŚ