CSE 473Introduction to Artificial Intelligence
Neural Networks
Henry Kautz
Spring 2006
Training a Single Neuron
• Idea: adjust weights to reduce sum of squared errors over training set– Error = difference between actual and intended
output
• Algorithm: gradient descent– Calculate derivative (slope) of error function– Take a small step in the “downward” direction– Step size is the “training rate”
• Single-layer network: can train each unit separately
Gradient Descent
2
2
1
2
1
2
d dd
d dd
E t o
t g w x
Computing Partial Derivatives
,
2
2
1
2
1
2
d ddi i
d dd i
d d d dd i
d d d dd i
d d d i dd
Et o
w w
t ow
t
t o g w x
o t ow
t o t g w xw
x
Single Unit Training Rule
Adjust weight i in proportion to…
• Training rate
• Error
• Derivative of the “squashing function”
• Degree to which input i was active
,d d di i dd
t o g w x xw
Sigmoid Units
,
,
,
Using the sigmoid squashing function
1( )
1is nice
1
because ( ) ( )(1 ( )). So:
1
in
d
i d d d i dd
d d d d i d
d d d i dd
d
g ine
g in g in g in
w t o g w x x
t o g w x g w x
t o o o
x
x
Sigmoid Unit Training Rule
Adjust weight i in proportion to…
• Training rate
• Error
• Degree to which output is ambiguous
• Degree to which input i was active
,1d d d d i did
tw o o o x
Expressivity of Neural Networks
• Single units can learn any linear function
• Single layer of units can learn any set of linear inequalities (convex region)
• Two layers can learn any continuous function
• Three layers can learn any computable function
Character Recognition Demo
BackProp Demo 1
• http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html
• Local version: BP.html
Backprop Demo 2
• http://www.williewheeler.com/software/bnn.html
• Local version: bnn.html
Modeling the Brain
• Backpropagation is the most commonly used algorithm for supervised learning with feed-forward neural networks
• But most neuroscientists believe that brain does not implement backprop
• Many other learning rules have been studied
Hebbian Learning• Alternative to backprop for unsupervised learning• Increase weights on connected neurons whenever both fire
simultaneously• Neurologically plausible (Hebbs 1949)
Self-Organizing Maps
• Unsupervised method for clustering data
• Learns a “winner take all” network where just one output neuron is on for each cluster
Why “Self-Organizing”
Recurrent Neural Networks
• Include time-delay feedback loops
• Can handle temporal data tasks, such as sequence prediction
Top Related