Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.

Stochastic Optimizationand Simulated Annealing

Psychology 85-419/719January 25, 2001

In Previous Lecture...

• Discussed constraint satisfaction networks, having:– Units, weights, and a “goodness” function

• Updating states involves computing input from other units– Guaranteed to locally increase goodness– Not guaranteed to globally increase goodness

The General Problem: Local Optima

Goo

dnes

s

Activation State

Local Optima

True Optima

How To Solve the Problemof Local Optima?

• Exhaustive search?– Nah. Takes too long. n units have 2 to the nth

power possible states (if binary)

• Random re-starts?– Seems wasteful.

• How about something that generally goes in the right direction, with some randomness?

Sometimes It Isn’t Best ToAlways Go Straight Towards

The Goal

• Rubik’s Cube: Undo some moves in order to make progress

• Baseball: sacrifice fly

• Navigation: move away from goal, to get around obstacles

Randomness Can Help Us Escape Bad Solutions

Activation State

So, How Random Do WeWant to Be?

• We can take a cue from physical systems• In metallurgy, metals can reach a very strong

(stable) state by:– Melting it; scrambles molecular structure– Gradually cooling it– Resulting molecular structure very stable

• New terminology: reduce energy (which is kind of like the negative of goodness)

Simulated Annealing

T

neti i

e

ap

1

1]1[

Odds that a unit is on is a function of:

The input to the unit, net

The temperature, T

Picking it Apart...

• As net increases, probability that output is 1 increases– e is raised to the negative of net/T; so as net gets

big, e to the negative of net/T goes to zero. So probability goes to 1/1=1.

T

neti i

e

ap

1

1]1[

The Temperature Term

• When T is big, the exponent for e goes to zero.

• e (or anything) to the zero power is 1

• So, odds output is 1 goes to 1/(1+1)=0.5

T

neti i

e

ap

1

1]1[

The Temperature Term (2)

T

neti i

e

ap

1

1]1[

• When T gets small, exponent gets big.

• Effect of net becomes amplified.

Different Temperatures...

Net Input

Pro

babi

lity

Out

put i

s 1

High Temp

Med Temp

Low Temp

0

1

.5

T

neti i

e

ap

1

1]1[

Ok, So At What RateDo We Reduce Temperature?

In general, must decreaseit very slowly to guaranteeconvergence to globaloptimum

)1log()(

t

ctT

0 50 100

T

In practice, we can getaway with a more aggressiveannealing schedule..

Putting it Together...

• We can represent facts, etc. as units

• Knowledge about these facts encoded as weights

• Network processing fills in gaps, makes inferences, forms interpretations

• Stable Attractors form; the weights and input sculpt these attractors.

• Stability (and goodness) enhanced with randomness in updating process.

Stable Attractors Can BeThought Of As Memories

• How many stable patterns can be remembered by a network with N units?

• There are 2 to the N possible patterns…• … but only about 0.15*N will be stable• To remember 100 things, need 100/0.15=666

units!• (then again, the brain has about 10 to the 12th

power neurons…)

Human Performance, When Damaged (some examples)

• Category coordinate errors– Naming a CAT as a DOG

• Superordinate errors– Naming a CAT as an ANIMAL

• Visual errors (deep dyslexics)– Naming SYMPATHY as SYMPHONY– or, naming SYMPATHY as ORCHESTRA

The Attractors We’ve TalkedAbout Can Be UsefulIn Understanding This

CAT

COT

“CAT”

CAT

COT

“CAT”

Normal Performance A Visual Error

(see Plaut Hinton, Shallice)

Properties of Human Memory

• Details tend to go first, more general things next. Not all-or-nothing forgetting.

• Things tend to be forgotten, based on– Salience– Recency– Complexity– Age of acquisition?

Do These Networks Have These Properties?

• Sort of.

• Graceful degradation. Features vanish as a function of strength of input to them.

• Complexity: more complex / arbitrary patterns can be more difficult to retain

• Salience, recency, age of acquisition?– Depends on learning rule. Stay tuned

Next Time:Psychological Implications:

The IAC Model of Word Perception

• Optional reading: McClelland and Rumelhart ‘81 (handout)

• Rest of this class: Lab session. Help installing software, help with homework.

Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.

Documents

Transcript of Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.