Neural Net Language Models Deep Learning and Neural Nets Spring 2015.
Neural Net Dup 1
-
Upload
pinaki-ghosh -
Category
Documents
-
view
232 -
download
0
Transcript of Neural Net Dup 1
![Page 1: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/1.jpg)
A Brief Overview of Neural Networks
![Page 2: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/2.jpg)
Overview• Relation to Biological Brain: Biological Neural Network• The Artificial Neuron• Types of Networks and Learning Techniques• Supervised Learning & Backpropagation Training
Algorithm• Learning by Example• Applications
![Page 3: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/3.jpg)
Biological Neuron
![Page 4: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/4.jpg)
Artificial Neuron
Σ f(n)W
W
W
W
Outputs
Activation
Function
INPUTS
W=Weight
Neuron
![Page 5: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/5.jpg)
Transfer Functions
: ( ) 11 nSIGMOID f n
e
: ( )LINEAR f n n
1
0 Input
Output
![Page 6: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/6.jpg)
Types of networks
Multiple Inputs and Single Layer
Multiple Inputs and layers
![Page 7: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/7.jpg)
Types of Networks – Contd.Feedback
Recurrent Networks
![Page 8: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/8.jpg)
Recurrent Networks• Feed forward networks:
– Information only flows one way– One input pattern produces one output– No sense of time (or memory of previous state)Recurrency– Nodes connect back to other nodes or
themselves– Information flow is multidirectional– Sense of time and memory of previous state(s)
• Biological nervous systems show high levels of recurrency (but feed-forward structures exists too)
![Page 9: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/9.jpg)
ANNs – The basics
• ANNs incorporate the two fundamental components of biological neural nets:
1. Neurones (nodes)
2. Synapses (weights)
![Page 10: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/10.jpg)
Feed-forward nets
• Information flow is unidirectional• Data is presented to Input layer
• Passed on to Hidden Layer
• Passed on to Output layer
• Information is distributed
• Information processing is parallel
Internal representation (interpretation) of data
![Page 11: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/11.jpg)
Neural networks are good for prediction problems.
• The inputs are well understood. You have a good idea of which features of the data are important, but not necessarily how to combine them.
• The output is well understood. You know what you are trying to predict.
• Experience is available. You have plenty of examples where both the inputs and the output are known. This experience will be used to train the network.
![Page 12: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/12.jpg)
• Feeding data through the net:
(1 0.25) + (0.5 (-1.5)) = 0.25 + (-0.75) = - 0.5 0.3775
11
5.0
eSquashing:
![Page 13: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/13.jpg)
Learning Techniques
• Supervised Learning:
Inputs from the environment
Neural Network
Actual System
Σ
Error
+
-
Expected Output
Actual Output
Training
![Page 14: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/14.jpg)
Multilayer Perceptron
Inputs First Hidden layer
Second Hidden Layer
Output Layer
![Page 15: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/15.jpg)
Signal FlowBackpropagation of Errors
Function SignalsError Signals
![Page 16: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/16.jpg)
Neural networks for Directed Data Mining: Building a model for classification and prediction
1. Identify the input and output features2. Normalize (scaling) the inputs and outputs so their range is
between 0 and 1.3. Set up a network on a representative set of training examples.4. Train the network on a representative set of training examples.5. Test the network on a test set strictly independent from the
training examples. If necessary repeat the training, adjusting the training set, network topology, nad parameters. Evaluate the network using the evaluation set to see how well it performs.
6. Apply the model generated by the network to predict outcomes for unknown inputs.
![Page 17: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/17.jpg)
Learning by Example
• Hidden layer transfer function: Sigmoid function = F(n)= 1/(1+exp(-n)), where n is the net input to the neuron.
Derivative= F’(n) = (output of the neuron)(1-output of the neuron) : Slope of the transfer function.
• Output layer transfer function: Linear function= F(n)=n; Output=Input to the neuron
Derivative= F’(n)= 1
![Page 18: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/18.jpg)
Purpose of the Activation Function
• We want the unit to be “active” (near +1) when the “right” inputs are given
• We want the unit to be “inactive” (near 0) when the “wrong” inputs are given.
• It’s preferable for activation function to be nonlinear. Otherwise, the entire neural network collapses into a simple linear function.
![Page 19: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/19.jpg)
Possibilities for activation function
Step function Sign function Sigmoid (logistic) function
step(x) = 1, if x > threshold 0, if x threshold(in picture above, threshold = 0)
sign(x) = +1, if x > 0 -1, if x 0
sigmoid(x) = 1/(1+e-x)
Adding an extra input with activation a0 = - 1 and weightW0,j = t (called the bias weight) is equivalent to having a threshold at t. This way we can always assume a 0 threshold.
![Page 20: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/20.jpg)
Using a Bias Weight to Standardize the Threshold
-1T
x1
x2
W1
W2
W1x1+ W2x2 < T
W1x1+ W2x2 - T < 0
![Page 21: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/21.jpg)
Real vs artificial neurons
axon
dendrites
dendrites
synapsecell
x0
xn
w0
wn
oi
n
iixw
0
otherwise 0 and 0 if 10
i
n
iixwo
Threshold units
![Page 22: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/22.jpg)
Implementing AND
x1
x2
o(x1,x2)
otherwise 0 05.1 if 1),( 2121
xxxxo
1
1
-1
W=1.5
Assume Boolean (0/1) input values…
![Page 23: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/23.jpg)
Implementing OR
x1
x2
o(x1,x2)
1
1
-1
W=0.5
o(x1,x2) = 1 if –0.5 + x1 + x2 > 0 = 0 otherwise
![Page 24: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/24.jpg)
Implementing NOT
x1 o(x1,x2)-1
W=-0.5-1
otherwise 0 05.0 if 1)( 11
xxo
![Page 25: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/25.jpg)
Implementing more complex Boolean functions
x1
x2
11
0.5-1
x1 or x2
x3
11
1.5
(x1 or x2) and x3
-1
![Page 26: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/26.jpg)
Learning by Example
• Training Algorithm: backpropagation of errors using gradient descent training.
• Colors:– Red: Current weights– Orange: Updated weights– Black boxes: Inputs and outputs to a neuron– Blue: Sensitivities at each layer
![Page 27: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/27.jpg)
• The perceptron learning rule performs gradient descent in weight space.– Error surface: The surface that describes the error on
each example as a function of all the weights in the network. A set of weights defines a point on this surface. (It could also be called a state in the state space of possible weights, i.e., weight space.)
– We look at the partial derivative of the surface with respect to each weight (i.e., the gradient -- how much the error would change if we made a small change in each weight). Then the weights are being altered in an amount proportional to the slope in each direction (corresponding to a weight). Thus the network as a whole is moving in the direction of steepest descent on the error surface.
![Page 28: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/28.jpg)
Definition of Error: Sum of Squared Errors
22
21)(
21 ErrotE
examples
Here, t is the correct (desired) output and o is the actualoutput of the neural net.
This is introduced to simplify the math on the next slide
![Page 29: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/29.jpg)
Reduction of Squared ErrorGradient descent reduces the squared error by calculatingthe partial derivative of E with respect to each weight:
j
n
kkk
j
jj
xingErr
xWgtW
Err
WErrErr
WEE
)('0
jjj xingErrWW )('
chain rule for derivatives
expand second Err above to (t – g(in))
This is called “in”
0
jWt
because and chain rule
The weight is updated by η times this gradient of error E in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples.
learning rate
The learning rate, η, is typically set to a small value such as 0.1
E is a vector
![Page 30: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/30.jpg)
First Pass
0.5
0.5
0.5
0.50.50.5
0.5
0.51
0.5
0.5 0.6225
0.62250.6225
0.6225
0.6508
0.6508
0.6508
0.6508
Error=1-0.6508=0.3492
G3=(1)(0.3492)=0.3492
G2= (0.6508)(1-0.6508)(0.3492)(0.5)=0.0397
G1= (0.6225)(1-0.6225)(0.0397)(0.5)(2)=0.0093
Gradient of the neuron= G =slope of the transfer function×[Σ{(weight of the neuron to the next neuron) × (output of the neuron)}]
Gradient of the output neuron = slope of the transfer function × error
![Page 31: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/31.jpg)
Weight Update 1New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.5+(0.5)(0.3492)(0.6508)
0.6136
0.5124 0.5124
0.51240.6136
0.5124
0.5047
0.5047
0.5+(0.5)(0.0397)(0.6225)0.5+(0.5)(0.0093)(1)
![Page 32: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/32.jpg)
Second Pass
0.5047
0.5124
0.6136
0.61360.50470.5124
0.5124
0.51241
0.5047
0.50470.6391
0.63910.6236
0.6236
0.8033
0.6545
0.6545
0.8033
Error=1-0.8033=0.1967
G3=(1)(0.1967)=0.1967
G2= (0.6545)(1-0.6545)(0.1967)(0.6136)=0.0273
G1= (0.6236)(1-0.6236)(0.5124)(0.0273)(2)=0.0066
![Page 33: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/33.jpg)
Weight Update 2New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.6136+(0.5)(0.1967)(0.6545)
0.6779
0.5209 0.5209
0.52090.6779
0.5209
0.508
0.508
0.5124+(0.5)(0.0273)(0.6236)0.5047+(0.5)(0.0066)(1)
![Page 34: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/34.jpg)
Third Pass
0.508
0.5209
0.6779
0.67790.5080.5209
0.5209
0.52091
0.508
0.5080.6504
0.65040.6243
0.62430.8909
0.6571
0.6571
0.8909
![Page 35: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/35.jpg)
Weight Update Summary
Output Expected OutputErrorw1 w2 w3
Initial conditions 0.5 0.5 0.5 0.6508 1 0.3492Pass 1 Update 0.5047 0.5124 0.6136 0.8033 1 0.1967Pass 2 Update 0.508 0.5209 0.6779 0.8909 1 0.1091
Weights
W1: Weights from the input to the input layerW2: Weights from the input layer to the hidden layerW3: Weights from the hidden layer to the output layer
![Page 36: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/36.jpg)
Training Algorithm
• The process of feedforward and backpropagation continues until the required mean squared error has been reached.
• Typical mse: 1e-5• Other complicated backpropagation
training algorithms also available.
![Page 37: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/37.jpg)
Why Gradient?O1
O2
O = Output of the neuronW = Weight N = Net input to the neuron
W1
W2 N = (O1×W1)+(O2×W2)
O3 = 1/[1+exp(-N)]
Error = Actual Output – O3
• To reduce error: Change in weights: o Learning rateo Rate of change of error w.r.t rate of change of weight
Gradient: rate of change of error w.r.t rate of change of ‘N’ Prior output (O1 and O2)
0 Input
Output 1
![Page 38: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/38.jpg)
Gradient in Detail• Gradient : Rate of change of error w.r.t rate of change in net input to neuron
o For output neurons Slope of the transfer function × error
o For hidden neurons : A bit complicated ! : error fed back in terms of gradient of successive neurons
Slope of the transfer function × [Σ (gradient of next neuron × weight connecting the neuron to the next neuron)] Why summation? Share the responsibility!!
![Page 39: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/39.jpg)
An Example
1
0.4
0.731
0.598
0.5
0.50.5
0.5
0.6645
0.6645
0.66
0.66
1
0
Error = 1-0.66 = 0.34
Error = 0-0.66 = -0.66
G1=0.66×(1-0.66)×(-0.66)= -0.148
G1=0.66×(1-0.66)×(0.34)= 0.0763
Reduce more
Increase less
![Page 40: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/40.jpg)
Improving performance
• Changing the number of layers and number of neurons in each layer.
• Variation in Transfer functions.• Changing the learning rate. • Training for longer times.• Type of pre-processing and post-
processing.
![Page 41: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/41.jpg)
Applications
• Used in complex function approximations, feature extraction & classification, and optimization & control problems
• Applicability in all areas of science and technology.
![Page 42: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/42.jpg)
Neural networks with more than one output
– A department store chain wants to predict the likelihood that customers will be purchasing products from various departments, like women apparel, furniture, and entertainment.
– The store wants to use this information to plan promotions and direct target mailings.
– This network has three outputs, one for each department. The outputs are a propensity for the customer described in the inputs to make his or her next purchase from the associated department.
![Page 43: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/43.jpg)
gender
age
Last purchase Propensity to purchase Women’s apparel
Propensity to purchase furniture
Propensity to purchase entertainment
![Page 44: Neural Net Dup 1](https://reader033.fdocuments.in/reader033/viewer/2022050900/577cc9d61a28aba711a4bfe1/html5/thumbnails/44.jpg)
How can the department store determine the right promotion or promotions to offer the customer?• Taking the department corresponding to the unit
with the maximum value• Taking the departments corresponding to the
units with the top three values• Taking the departments corresponding to the
units that exceed some threshold value• Taking all departments corresponding to units
that are some percentage of the unit with he maximum value.