CSE 473 Introduction to Artificial Intelligence Neural Networks

29
CSE 473 Introduction to Artificial Intelligence Neural Networks Henry Kautz Spring 2006

description

CSE 473 Introduction to Artificial Intelligence Neural Networks. Henry Kautz Spring 2006. Training a Single Neuron. Idea: adjust weights to reduce sum of squared errors over training set Error = difference between actual and intended output Algorithm: gradient descent - PowerPoint PPT Presentation

Transcript of CSE 473 Introduction to Artificial Intelligence Neural Networks

Page 1: CSE 473 Introduction to Artificial Intelligence Neural Networks

CSE 473Introduction to Artificial Intelligence

Neural Networks

Henry Kautz

Spring 2006

Page 2: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 3: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 4: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 5: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 6: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 7: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 8: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 9: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 10: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 11: CSE 473 Introduction to Artificial Intelligence Neural Networks

Training a Single Neuron

• Idea: adjust weights to reduce sum of squared errors over training set– Error = difference between actual and intended

output

• Algorithm: gradient descent– Calculate derivative (slope) of error function– Take a small step in the “downward” direction– Step size is the “training rate”

• Single-layer network: can train each unit separately

Page 12: CSE 473 Introduction to Artificial Intelligence Neural Networks

Gradient Descent

2

2

1

2

1

2

d dd

d dd

E t o

t g w x

Page 13: CSE 473 Introduction to Artificial Intelligence Neural Networks

Computing Partial Derivatives

,

2

2

1

2

1

2

d ddi i

d dd i

d d d dd i

d d d dd i

d d d i dd

Et o

w w

t ow

t

t o g w x

o t ow

t o t g w xw

x

Page 14: CSE 473 Introduction to Artificial Intelligence Neural Networks

Single Unit Training Rule

Adjust weight i in proportion to…

• Training rate

• Error

• Derivative of the “squashing function”

• Degree to which input i was active

,d d di i dd

t o g w x xw

Page 15: CSE 473 Introduction to Artificial Intelligence Neural Networks

Sigmoid Units

,

,

,

Using the sigmoid squashing function

1( )

1is nice

1

because ( ) ( )(1 ( )). So:

1

in

d

i d d d i dd

d d d d i d

d d d i dd

d

g ine

g in g in g in

w t o g w x x

t o g w x g w x

t o o o

x

x

Page 16: CSE 473 Introduction to Artificial Intelligence Neural Networks

Sigmoid Unit Training Rule

Adjust weight i in proportion to…

• Training rate

• Error

• Degree to which output is ambiguous

• Degree to which input i was active

,1d d d d i did

tw o o o x

Page 17: CSE 473 Introduction to Artificial Intelligence Neural Networks

Expressivity of Neural Networks

• Single units can learn any linear function

• Single layer of units can learn any set of linear inequalities (convex region)

• Two layers can learn any continuous function

• Three layers can learn any computable function

Page 18: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 19: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 20: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 21: CSE 473 Introduction to Artificial Intelligence Neural Networks
Page 22: CSE 473 Introduction to Artificial Intelligence Neural Networks

Character Recognition Demo

Page 23: CSE 473 Introduction to Artificial Intelligence Neural Networks

BackProp Demo 1

• http://www.neuro.sfc.keio.ac.jp/~masato/jv/sl/BP.html

• Local version: BP.html

Page 24: CSE 473 Introduction to Artificial Intelligence Neural Networks

Backprop Demo 2

• http://www.williewheeler.com/software/bnn.html

• Local version: bnn.html

Page 25: CSE 473 Introduction to Artificial Intelligence Neural Networks

Modeling the Brain

• Backpropagation is the most commonly used algorithm for supervised learning with feed-forward neural networks

• But most neuroscientists believe that brain does not implement backprop

• Many other learning rules have been studied

Page 26: CSE 473 Introduction to Artificial Intelligence Neural Networks

Hebbian Learning• Alternative to backprop for unsupervised learning• Increase weights on connected neurons whenever both fire

simultaneously• Neurologically plausible (Hebbs 1949)

Page 27: CSE 473 Introduction to Artificial Intelligence Neural Networks

Self-Organizing Maps

• Unsupervised method for clustering data

• Learns a “winner take all” network where just one output neuron is on for each cluster

Page 28: CSE 473 Introduction to Artificial Intelligence Neural Networks

Why “Self-Organizing”

Page 29: CSE 473 Introduction to Artificial Intelligence Neural Networks

Recurrent Neural Networks

• Include time-delay feedback loops

• Can handle temporal data tasks, such as sequence prediction