Neural Networks - uni-hannover.de
Transcript of Neural Networks - uni-hannover.de
Neural NetworksNeural Networks
Slides from: Doug Gray, David PooleSlides from: Doug Gray, David Poole
What is a Neural Network?What is a Neural Network?
• Information processing paradigm that is Information processing paradigm that is inspired by the way biological nervous inspired by the way biological nervous systems, such as the brain, process systems, such as the brain, process informationinformation
• A method of computing, based on the A method of computing, based on the interaction of multiple connected interaction of multiple connected processing elementsprocessing elements
What can a Neural Net do?What can a Neural Net do?
Compute a known functionCompute a known functionApproximate an unknown functionApproximate an unknown functionPattern RecognitionPattern RecognitionSignal ProcessingSignal Processing
Learn to do any of the aboveLearn to do any of the above
Basic ConceptsBasic Concepts
Neural Network
Input 0 Input 1 Input n...
Output 0 Output 1 Output m...
A Neural Network generally A Neural Network generally maps a set of inputs to a set maps a set of inputs to a set of outputsof outputs
Number of inputs/outputs is Number of inputs/outputs is variablevariable
The Network itself is The Network itself is composed of an arbitrary composed of an arbitrary number of nodes with an number of nodes with an arbitrary topologyarbitrary topology
Basic ConceptsBasic Concepts Definition of a node:Definition of a node:
• A node is an element A node is an element which performs the which performs the functionfunctiony = fy = fHH(∑(w(∑(wiixxii) + W) + Wbb))fH(x)
Input 0 Input 1 Input n...
W0 W1 Wn
+
Output
+
...
Wb
NodeNode
ConnectionConnection
PropertiesProperties
Inputs are flexible Inputs are flexible any real valuesany real values Highly correlated or independentHighly correlated or independent
Target function may be discrete-valued, real-Target function may be discrete-valued, real-valued, or vectors of discrete or real valuesvalued, or vectors of discrete or real values Outputs are real numbers between 0 and 1Outputs are real numbers between 0 and 1
Resistant to errors in the training dataResistant to errors in the training dataLong training timeLong training timeFast evaluationFast evaluationThe function produced can be difficult for The function produced can be difficult for humans to interprethumans to interpret
PerceptronsPerceptrons
Basic unit in a neural networkBasic unit in a neural networkLinear separatorLinear separatorPartsParts N inputs, x1 ... xnN inputs, x1 ... xn Weights for each input, w1 ... wnWeights for each input, w1 ... wn A bias input x0 (constant) and associated A bias input x0 (constant) and associated
weight w0weight w0 Weighted sum of inputs, y = w0x0 + w1x1 + ... Weighted sum of inputs, y = w0x0 + w1x1 + ...
+ wnxn+ wnxn A threshold function (activation function), i.e 1 if A threshold function (activation function), i.e 1 if
y > 0, -1 if y <= 0y > 0, -1 if y <= 0
DiagramDiagram
x1
x2
.
.
.
xn
Σ Threshold
y = Σ wixi
x0
w0
w1
w2
wn
1 if y >0-1 otherwise
Typical Activation FunctionsTypical Activation Functions
F(x) = 1 / (1 + e F(x) = 1 / (1 + e –x–x))
Using a nonlinear Using a nonlinear function which function which approximates a linear approximates a linear threshold allows a threshold allows a network to approximate network to approximate nonlinear functionsnonlinear functions
Simple PerceptronSimple Perceptron
Binary logic applicationBinary logic applicationffHH(x) = u(x) [linear threshold](x) = u(x) [linear threshold]
WWii = random(-1,1) = random(-1,1)
Y = u(WY = u(W00XX00 + W + W11XX11 + W + Wbb))
Now how do we train it?Now how do we train it?
fH(x)
Input 0 Input 1
W0 W1
+
Output
Wb
Basic TrainingBasic Training
Perception learning rulePerception learning ruleΔWΔWii = η * (D – Y) * X = η * (D – Y) * Xii
η = Learning Rateη = Learning RateD = Desired OutputD = Desired Output
Adjust weights based on how well the Adjust weights based on how well the current weights match an objectivecurrent weights match an objective
Logic TrainingLogic TrainingExpose the network to the logical Expose the network to the logical OR operationOR operationUpdate the weights after each Update the weights after each epochepoch
As the output approaches the As the output approaches the desired output for all cases, ΔWdesired output for all cases, ΔW i i will will approach 0approach 0
XX00 XX11 DD
00 00 0000 11 1111 00 1111 11 11
ResultsResultsWW00 W W11 W Wbb
DetailsDetailsNetwork converges on a hyper-plane decision Network converges on a hyper-plane decision surfacesurfaceXX11 = (W = (W00/W/W11)X)X00 + (W + (Wbb/W/W11))
XX11
XX00
Feed-forward neural networksFeed-forward neural networks
Feed-forward neural networks are the Feed-forward neural networks are the most common most common models.models.These are directed acyclic graphs:These are directed acyclic graphs:
Neural Network for the news Neural Network for the news exampleexample
Axiomatizing the NetworkAxiomatizing the NetworkThe values of the attributes are real numbers.The values of the attributes are real numbers.Thirteen parameters Thirteen parameters w0; … ;w12 are real numbers.w0; … ;w12 are real numbers.The attributes The attributes h1 and h2 correspond to the values ofh1 and h2 correspond to the values ofhidden units.hidden units.There are 13 real numbers to be learned. The There are 13 real numbers to be learned. The hypothesis space is thus a 13-dimensional real space.hypothesis space is thus a 13-dimensional real space.Each point in this 13-dimensional space corresponds Each point in this 13-dimensional space corresponds to a particular logic program that predicts a value for to a particular logic program that predicts a value for reads reads given given known, new, short, and homeknown, new, short, and home
Prediction ErrorPrediction Error
Neural Network LearningNeural Network Learning
Aim of neural network learning: given a set Aim of neural network learning: given a set of examples, find parameter settings that of examples, find parameter settings that minimize the error.minimize the error.Back-propagation learning is gradient Back-propagation learning is gradient descent search through the parameter descent search through the parameter space to minimize the space to minimize the sum-of-squares sum-of-squares error.error.
Backpropagation LearningBackpropagation Learning
Inputs:Inputs: A network, including all units and their A network, including all units and their
connectionsconnections Stopping CriteriaStopping Criteria Learning Rate (constant of proportionality of Learning Rate (constant of proportionality of
gradient gradient descent search)descent search) Initial values for the parametersInitial values for the parameters A set of classified training dataA set of classified training dataOutput: Updated values for the parametersOutput: Updated values for the parameters
Backpropagation Learning Backpropagation Learning AlgorithmAlgorithm
RepeatRepeat evaluate the network on each example given evaluate the network on each example given
the the current parameter settingscurrent parameter settings determine the derivative of the error for each determine the derivative of the error for each
parameterparameter change each parameter in proportion to its change each parameter in proportion to its
derivativederivative
until the stopping criteria is metuntil the stopping criteria is met
Gradient Descent for Neural Net Gradient Descent for Neural Net LearningLearning
Bias in neural networks and Bias in neural networks and decision treesdecision trees
It’s easy for a neural network to represent “at least It’s easy for a neural network to represent “at least two of two of II11, …, I, …, Ikk are true”: are true”:
ww0 0 ww11 w wkk-15 10 10-15 10 10
This concept forms a large decision tree.This concept forms a large decision tree.
Consider representing a conditional: “If Consider representing a conditional: “If c then a c then a else b”:else b”: Simple in a decision tree.Simple in a decision tree. Needs a complicated neural network to representNeeds a complicated neural network to represent
(c ^ a) V (~c ^ b).(c ^ a) V (~c ^ b).
Neural Networks and LogicNeural Networks and Logic
Meaning is attached to the input and Meaning is attached to the input and output units.output units.There is no a priori meaning associated There is no a priori meaning associated with the hidden with the hidden units.units.What the hidden units actually represent is What the hidden units actually represent is something something that’s learned.that’s learned.