CSE 473 Introduction to Artificial Intelligence Neural Networks
description
Transcript of CSE 473 Introduction to Artificial Intelligence Neural Networks
CSE 473Introduction to Artificial Intelligence
Neural Networks
Henry Kautz
Spring 2006
Training a Single Neuron
• Idea: adjust weights to reduce sum of squared errors over training set– Error = difference between actual and intended
output
• Algorithm: gradient descent– Calculate derivative (slope) of error function– Take a small step in the “downward” direction– Step size is the “training rate”
• Single-layer network: can train each unit separately
Gradient Descent
2
2
1
2
1
2
d dd
d dd
E t o
t g w x
Computing Partial Derivatives
,
2
2
1
2
1
2
d ddi i
d dd i
d d d dd i
d d d dd i
d d d i dd
Et o
w w
t ow
t
t o g w x
o t ow
t o t g w xw
x
Single Unit Training Rule
Adjust weight i in proportion to…
• Training rate
• Error
• Derivative of the “squashing function”
• Degree to which input i was active
,d d di i dd
t o g w x xw
Sigmoid Units
,
,
,
Using the sigmoid squashing function
1( )
1is nice
1
because ( ) ( )(1 ( )). So:
1
in
d
i d d d i dd
d d d d i d
d d d i dd
d
g ine
g in g in g in
w t o g w x x
t o g w x g w x
t o o o
x
x
Sigmoid Unit Training Rule
Adjust weight i in proportion to…
• Training rate
• Error
• Degree to which output is ambiguous
• Degree to which input i was active
,1d d d d i did
tw o o o x
Expressivity of Neural Networks
• Single units can learn any linear function
• Single layer of units can learn any set of linear inequalities (convex region)
• Two layers can learn any continuous function
• Three layers can learn any computable function
Character Recognition Demo
BackProp Demo 1
• http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html
• Local version: BP.html
Backprop Demo 2
• http://www.williewheeler.com/software/bnn.html
• Local version: bnn.html
Modeling the Brain
• Backpropagation is the most commonly used algorithm for supervised learning with feed-forward neural networks
• But most neuroscientists believe that brain does not implement backprop
• Many other learning rules have been studied
Hebbian Learning• Alternative to backprop for unsupervised learning• Increase weights on connected neurons whenever both fire
simultaneously• Neurologically plausible (Hebbs 1949)
Self-Organizing Maps
• Unsupervised method for clustering data
• Learns a “winner take all” network where just one output neuron is on for each cluster
Why “Self-Organizing”
Recurrent Neural Networks
• Include time-delay feedback loops
• Can handle temporal data tasks, such as sequence prediction