Download - CSE 473 Introduction to Artificial Intelligence Neural Networks

CSE 473Introduction to Artificial Intelligence

Neural Networks

Henry Kautz

Spring 2006

Training a Single Neuron

• Idea: adjust weights to reduce sum of squared errors over training set– Error = difference between actual and intended

output

• Algorithm: gradient descent– Calculate derivative (slope) of error function– Take a small step in the “downward” direction– Step size is the “training rate”

• Single-layer network: can train each unit separately

Gradient Descent

2

2

1

2

1

2

d dd

d dd

E t o

t g w x

Computing Partial Derivatives

,

2

2

1

2

1

2

d ddi i

d dd i

d d d dd i

d d d dd i

d d d i dd

Et o

w w

t ow

t

t o g w x

o t ow

t o t g w xw

x

Single Unit Training Rule

Adjust weight i in proportion to…

• Training rate

• Error

• Derivative of the “squashing function”

• Degree to which input i was active

,d d di i dd

t o g w x xw

Sigmoid Units

,

,

,

Using the sigmoid squashing function

1( )

1is nice

1

because ( ) ( )(1 ( )). So:

1

in

d

i d d d i dd

d d d d i d

d d d i dd

d

g ine

g in g in g in

w t o g w x x

t o g w x g w x

t o o o

x

x

Sigmoid Unit Training Rule

Adjust weight i in proportion to…

• Training rate

• Error

• Degree to which output is ambiguous

• Degree to which input i was active

,1d d d d i did

tw o o o x

Expressivity of Neural Networks

• Single units can learn any linear function

• Single layer of units can learn any set of linear inequalities (convex region)

• Two layers can learn any continuous function

• Three layers can learn any computable function

Character Recognition Demo

BackProp Demo 1

• http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html

• Local version: BP.html

http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html

http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html

Backprop Demo 2

• http://www.williewheeler.com/software/bnn.html

• Local version: bnn.html

http://www.williewheeler.com/software/bnn.html

http://www.williewheeler.com/software/bnn.html

Modeling the Brain

• Backpropagation is the most commonly used algorithm for supervised learning with feed-forward neural networks

• But most neuroscientists believe that brain does not implement backprop

• Many other learning rules have been studied

Hebbian Learning• Alternative to backprop for unsupervised learning• Increase weights on connected neurons whenever both fire

simultaneously• Neurologically plausible (Hebbs 1949)

Self-Organizing Maps

• Unsupervised method for clustering data

• Learns a “winner take all” network where just one output neuron is on for each cluster

Why “Self-Organizing”

Recurrent Neural Networks

• Include time-delay feedback loops

• Can handle temporal data tasks, such as sequence prediction