Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.
-
Upload
wesley-hill -
Category
Documents
-
view
214 -
download
2
Transcript of Stochastic Optimization and Simulated Annealing Psychology 85-419/719 January 25, 2001.
Stochastic Optimizationand Simulated Annealing
Psychology 85-419/719January 25, 2001
In Previous Lecture...
• Discussed constraint satisfaction networks, having:– Units, weights, and a “goodness” function
• Updating states involves computing input from other units– Guaranteed to locally increase goodness– Not guaranteed to globally increase goodness
The General Problem: Local Optima
Goo
dnes
s
Activation State
Local Optima
True Optima
How To Solve the Problemof Local Optima?
• Exhaustive search?– Nah. Takes too long. n units have 2 to the nth
power possible states (if binary)
• Random re-starts?– Seems wasteful.
• How about something that generally goes in the right direction, with some randomness?
Sometimes It Isn’t Best ToAlways Go Straight Towards
The Goal
• Rubik’s Cube: Undo some moves in order to make progress
• Baseball: sacrifice fly
• Navigation: move away from goal, to get around obstacles
Randomness Can Help Us Escape Bad Solutions
Activation State
So, How Random Do WeWant to Be?
• We can take a cue from physical systems• In metallurgy, metals can reach a very strong
(stable) state by:– Melting it; scrambles molecular structure– Gradually cooling it– Resulting molecular structure very stable
• New terminology: reduce energy (which is kind of like the negative of goodness)
Simulated Annealing
T
neti i
e
ap
1
1]1[
Odds that a unit is on is a function of:
The input to the unit, net
The temperature, T
Picking it Apart...
• As net increases, probability that output is 1 increases– e is raised to the negative of net/T; so as net gets
big, e to the negative of net/T goes to zero. So probability goes to 1/1=1.
T
neti i
e
ap
1
1]1[
The Temperature Term
• When T is big, the exponent for e goes to zero.
• e (or anything) to the zero power is 1
• So, odds output is 1 goes to 1/(1+1)=0.5
T
neti i
e
ap
1
1]1[
The Temperature Term (2)
T
neti i
e
ap
1
1]1[
• When T gets small, exponent gets big.
• Effect of net becomes amplified.
Different Temperatures...
Net Input
Pro
babi
lity
Out
put i
s 1
High Temp
Med Temp
Low Temp
0
1
.5
T
neti i
e
ap
1
1]1[
Ok, So At What RateDo We Reduce Temperature?
In general, must decreaseit very slowly to guaranteeconvergence to globaloptimum
)1log()(
t
ctT
0 50 100
T
In practice, we can getaway with a more aggressiveannealing schedule..
Putting it Together...
• We can represent facts, etc. as units
• Knowledge about these facts encoded as weights
• Network processing fills in gaps, makes inferences, forms interpretations
• Stable Attractors form; the weights and input sculpt these attractors.
• Stability (and goodness) enhanced with randomness in updating process.
Stable Attractors Can BeThought Of As Memories
• How many stable patterns can be remembered by a network with N units?
• There are 2 to the N possible patterns…• … but only about 0.15*N will be stable• To remember 100 things, need 100/0.15=666
units!• (then again, the brain has about 10 to the 12th
power neurons…)
Human Performance, When Damaged (some examples)
• Category coordinate errors– Naming a CAT as a DOG
• Superordinate errors– Naming a CAT as an ANIMAL
• Visual errors (deep dyslexics)– Naming SYMPATHY as SYMPHONY– or, naming SYMPATHY as ORCHESTRA
The Attractors We’ve TalkedAbout Can Be UsefulIn Understanding This
CAT
COT
“CAT”
CAT
COT
“CAT”
Normal Performance A Visual Error
(see Plaut Hinton, Shallice)
Properties of Human Memory
• Details tend to go first, more general things next. Not all-or-nothing forgetting.
• Things tend to be forgotten, based on– Salience– Recency– Complexity– Age of acquisition?
Do These Networks Have These Properties?
• Sort of.
• Graceful degradation. Features vanish as a function of strength of input to them.
• Complexity: more complex / arbitrary patterns can be more difficult to retain
• Salience, recency, age of acquisition?– Depends on learning rule. Stay tuned
Next Time:Psychological Implications:
The IAC Model of Word Perception
• Optional reading: McClelland and Rumelhart ‘81 (handout)
• Rest of this class: Lab session. Help installing software, help with homework.