Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

26
Backpropagation Backpropagation Introduction to Introduction to Artificial Intelligence Artificial Intelligence COS302 COS302 Michael L. Littman Michael L. Littman Fall 2001 Fall 2001
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    225
  • download

    0

Transcript of Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Page 1: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

BackpropagationBackpropagation

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

Page 2: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

AdministrationAdministration

Questions, concerns?Questions, concerns?

Page 3: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Classification Percept.Classification Percept.

xx11

netnet

xx22 xx33 xxDD…

sumsum

ww11

11

ww22 ww33wwDD ww00

outout squashsquash

gg

Page 4: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

PerceptronsPerceptrons

Recall that the squashing function Recall that the squashing function makes the output look more like makes the output look more like bits: 0 or 1 decisions.bits: 0 or 1 decisions.

What if we give it inputs that are also What if we give it inputs that are also bits?bits?

Page 5: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

A Boolean FunctionA Boolean Function

A B C D E FA B C D E F G G outout

1 0 1 0 1 0 11 0 1 0 1 0 1 00

0 1 1 0 0 0 10 1 1 0 0 0 1 00

0 0 1 0 0 1 00 0 1 0 0 1 0 00

1 0 0 0 1 0 01 0 0 0 1 0 0 11

0 0 1 1 0 0 00 0 1 1 0 0 0 11

1 1 1 0 1 0 11 1 1 0 1 0 1 00

0 1 0 1 0 0 10 1 0 1 0 0 1 11

1 1 1 1 1 0 11 1 1 1 1 0 1 11

1 1 1 1 1 1 11 1 1 1 1 1 1 11

1 1 1 0 0 1 11 1 1 0 0 1 1 00

Page 6: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Think GraphicallyThink Graphically

Can perceptron learn this?Can perceptron learn this?

CC

DD

11

11

11

00

Page 7: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Ands and OrsAnds and Ors

out(out(xx) = g(sum) = g(sumk k wwk k xxkk))

How can we set the weights to represent How can we set the weights to represent (v(v11)(v)(v22)(~v)(~v77)) ? ? ANDAND

wwii=0, except=0, except

ww11=10, w=10, w22=10, w=10, w77=-10, w=-10, w00=-15 (5-max)=-15 (5-max)

How about How about ~v~v3 3 ++ vv4 4 ++ ~v~v88 ?? OROR

wwii=0, except=0, except

ww11=-10, w=-10, w22=10, w=10, w77=-10, w=-10, w00=15 (-5-min)=15 (-5-min)

Page 8: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

MajorityMajority

Are at least half the bits on?Are at least half the bits on?

Set all weights to 1, wSet all weights to 1, w00 to –n/2. to –n/2.A B C D E FA B C D E F G G outout1 0 1 0 1 0 11 0 1 0 1 0 1 110 1 1 0 0 0 10 1 1 0 0 0 1 000 0 1 0 0 1 00 0 1 0 0 1 0 001 0 0 0 1 0 01 0 0 0 1 0 0 001 1 1 0 1 0 11 1 1 0 1 0 1 110 1 0 1 0 0 10 1 0 1 0 0 1 001 1 1 1 1 0 11 1 1 1 1 0 1 111 1 1 1 1 1 11 1 1 1 1 1 1 11

Representation size using decision tree?Representation size using decision tree?

Page 9: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Sweet Sixteen?Sweet Sixteen?

abab (~a)+(~b)(~a)+(~b)

a(~b)a(~b) (~a)+b(~a)+b

(~a)b(~a)b a+(~b)a+(~b)

(~a)(~b)(~a)(~b) a+ba+b

aa ~a~a

bb ~b~b

11 00

a = ba = b a exclusive-or b (a a exclusive-or b (a b) b)

Page 10: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

XOR ConstraintsXOR Constraints

A B A B outout

0 0 0 0 00 g(g(ww00) < 1/2) < 1/2

0 1 0 1 11 g(wg(wBB++ww00) > 1/2) > 1/2

1 01 0 11 g(wg(wAA++ww00) > 1/2) > 1/2

1 11 1 00 g(wg(wAA+w+wBB++ww00) < 1/2) < 1/2

ww0 0 < 0, w< 0, wAA+w+w00>0, w>0, wBB+w+w00>0,>0,

wwAA+w+wBB+2 w+2 w00>0, 0 < w>0, 0 < wAA+w+wBB+w+w0 0 < 0< 0

Page 11: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Linearly SeparableLinearly Separable

XOR problematicXOR problematic

CC

DD

00

00

11

11

??

Page 12: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

How Represent XOR?How Represent XOR?

A xor B A xor B = (A+B)(~A+~B)= (A+B)(~A+~B)

netnet

AA BB

cc11

netnet

11cc22

netnet

outout

1010 -10-10-10-101010

11-5-5

111515

1010 1010 -15-15

Page 13: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Requiem for a PerceptronRequiem for a Perceptron

Rosenblatt proved that a perceptron Rosenblatt proved that a perceptron will learn any linearly separable will learn any linearly separable function.function.

Minsky and Papert (1969) in Minsky and Papert (1969) in PerceptronsPerceptrons: “there is no reason to : “there is no reason to suppose that any of the virtues suppose that any of the virtues carry over to the many-layered carry over to the many-layered version.”version.”

Page 14: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

BackpropagationBackpropagation

Bryson and Ho (1969, same year) Bryson and Ho (1969, same year) described a training procedure for described a training procedure for multilayer networks. Went multilayer networks. Went unnoticed.unnoticed.

Multiply rediscovered in the 1980s.Multiply rediscovered in the 1980s.

Page 15: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Multilayer NetMultilayer Net

xx11

netnet11ii

xx22 xx33 xxDD…WW1111

11WW1212WW1313

hidhid11gg

netnetHHiinetnet22

ii

hidhidhidhid22

UU11 netnetii

outout

…UU00

Page 16: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Multiple OutputsMultiple Outputs

Makes no difference for the Makes no difference for the perceptron.perceptron.

Add more outputs off the hidden Add more outputs off the hidden layer in the multilayer case.layer in the multilayer case.

Page 17: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Output FunctionOutput Function

outoutii((xx) = g(sum) = g(sumj j UUji ji g(sumg(sumk k WWkj kj xxkk))))

H: number of “hidden” nodesH: number of “hidden” nodes

Also:Also:• Use more than one hidden layerUse more than one hidden layer• Use direct input-output weightsUse direct input-output weights

Page 18: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

How Train?How Train?

Find a set of weights U, WFind a set of weights U, W

that minimizethat minimize

sumsum((xx,,yy) ) sumsumi i (y(yii-out-outii((xx))))22

using gradient descent.using gradient descent.

Incremental version (vs. batch):Incremental version (vs. batch):

Move weights a small amount for Move weights a small amount for each training exampleeach training example

Page 19: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Updating WeightsUpdating Weights

1.1. Feed-forward to hidden:Feed-forward to hidden: netnetj j = = sumsumk k WWkj kj xxkk; hid; hidjj = g(net = g(netjj))

2.2. Feed-forward to output:Feed-forward to output:

netneti i = sum= sumj j UUji ji hidhidjj; out; outii = g(net = g(netii))

3. Update output weights:3. Update output weights:

i i = g’(net= g’(netii) (y) (yii-out-outii); U); Uji ji += += hid hidjj ii

4. Update hidden weights:4. Update hidden weights:

jj= g’(net= g’(netjj) sum) sumi i UUjj jj ii; W; Wkj kj += += x xkk jj

Page 20: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Multilayer Net (schema)Multilayer Net (schema)

WWkjkj

xxkk

netnetjj

hidhidjj

netnetii

outoutii

UUjiji

jj

ii

yyii

UUjiji

Page 21: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Does it Work?Does it Work?

Sort of: Lots of practical applications, Sort of: Lots of practical applications, lots of people play with it. Fun.lots of people play with it. Fun.

However, can fall prey to the However, can fall prey to the standard problems with local standard problems with local search…search…

NP-hard to train a 3-node net.NP-hard to train a 3-node net.

Page 22: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Step Size IssuesStep Size Issues

Too small? Too big?Too small? Too big?

Page 23: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Representation IssuesRepresentation Issues

Any continuous function can be Any continuous function can be represented by a one hidden layer net represented by a one hidden layer net with sufficient hidden nodes.with sufficient hidden nodes.

Any function at all can be represented by Any function at all can be represented by a two hidden layer net with a sufficient a two hidden layer net with a sufficient number of hidden nodes.number of hidden nodes.

What’s the downside for learning?What’s the downside for learning?

Page 24: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Generalization IssuesGeneralization Issues

Pruning weights: Pruning weights: “optimal “optimal brain damage”brain damage”

Cross validationCross validation

Much, much more to this. Take a Much, much more to this. Take a class on machine learning.class on machine learning.

Page 25: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

What to LearnWhat to Learn

Representing logical functions using Representing logical functions using sigmoid unitssigmoid units

Majority (net vs. decision tree)Majority (net vs. decision tree)

XOR is not linearly separableXOR is not linearly separable

Adding layers adds expressibilityAdding layers adds expressibility

Backprop is gradient descentBackprop is gradient descent

Page 26: Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 10 (due 12/12)Homework 10 (due 12/12)

1.1. Describe a procedure for Describe a procedure for converting a Boolean formula in converting a Boolean formula in CNF (n variables, m clauses) into CNF (n variables, m clauses) into an equivalent network? How an equivalent network? How many hidden units does it have?many hidden units does it have?

2.2. More soonMore soon