Optimizing online learning capacity in a biologically-inspired neural network Janelia Farm, March...
-
Upload
ross-robbins -
Category
Documents
-
view
215 -
download
0
Transcript of Optimizing online learning capacity in a biologically-inspired neural network Janelia Farm, March...
Optimizing online learning capacity in a biologically-inspired neural
network
Xundong WuNeuroscience Graduate ProgramUniversity of Southern California
Advisor: Bartlett Mel
Synaptic plasticity, online learning
Lee, Huang et.al 2005
Synaptic basis of learning and memoryHebb (1949)Bliss & Lomo (1973)Bliss & Gardner-Medwin (1973)Levy & Steward (1983)Lynch, Larson, Kelso, Barrionuevo & Schottler (1983)Lynch & Baudry (1984)Morris, Anderson, Lynch & Baudry (1986)Malenka (1988)Ito (1989)Bliss & Collingridge (1993)Malenka (1994)Frey & Morris (1997)
Online learning modelNadal, Toulouse, Changeux & Dehaene (1986) Amit & Fusi (1994) Henson & Willshaw (1995) Norman & O'Reilly (2003)Fusi, Drew & Abbott (2005) Fusi & Abbott (2007) Greve, Sterratt, Donaldson, Willshaw & van Rossum (2008)
pointneuron
The network: axons crossing through dendrites, making synapses
“Pattern”
A pattern is a set of activated axons
“Pattern”
A pattern is a set of activated axons
Dendrites are the unit of learning
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Herreras (1990)Kim & Connors (1993) Schiller et al (1997)Kamondi et al (1998)Larkum et al (1999)Helmchen et al (1999)Golding et al (2002)Schiller et al (2000)Losonczy & Magee (2006)Sobczyk & Svoboda (2007)Major et al (2008) Remy et al (2009)Larkum et al (2009)
Golding et al (2002)Frey & Morris (1997)Harvey & Svoboda (2007)Bollmann & Engert (2009)Govindarajan A, Israely I et al (2011)
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
We assume neurons have separately thresholded dendritic “subunits”
Learning
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
Trainedpattern θF
We assume neurons have separately thresholded dendritic “subunits”
Compartmentalized Firing
Compartmentalized Plasticity
Larkum et al (1999)
Losonczy & Magee (2006)
Harvey & Svoboda (2007)
θL
Trained pattern θF
We assume neurons have separately thresholded dendritic “subunits”
θL
Trained pattern θF
recognition threshold
θR no yes
untrainedresponses
trainedresponses
1% false negatives
1% false positives
We assume neurons have separately thresholded dendritic “subunits”
We previously showed storage capacity is maximized when:
•Synaptic plasticity is extremely sparse
•Synaptic plasticity is dendritic specific
•Patterns are stored by strengthening synapsesSynaptic potentiation should be governed by both presynaptic θLpre and postsynaptic θLpost learning thresholds
•Patterns are forgotten by weakening synapsesSynaptic depression should occur in the least recently strengthened (i.e. “oldest”) synapsesWu, X. E. and B. W. Mel (2009). "Capacity-enhancing synaptic learning rules in a medial temporal lobe online learning model." Neuron 62(1): 31-41.
What dendritic morphology maximizes the ability of this type of network to learn?
•Why do neurons in memory areas of the brain grow dendrites of particular sizes? •Knowing how memory capacity depends on dendritic size will help us understand which changes in morphology (e.g. spine density and/or dendritic length) are most disruptive to memory function
Dierssen, Benavides-Piccione, et al (2003)
Control Down syndrome
Irwin & Patel et al (2000)
Control Fragile X
10 100 1,000 10,000 # of synapses per dendrite (K)
The central question
Morphology
Capa
city
???
Given N total synapses, how does capacity depend on dendrite size?
10 100 1,000 10,000 # of synapses per dendrite (K)
The central question
Capa
city
???
Given N total synapses, how does capacity depend on dendrite size?
The parameters of my study
1. Pattern activation density: from 1.5% to 6%
3. Correlations: small, medium, large
Each axon activates from 200 to 10,000 synapses, introducing correlations between dendrites
2. Noise level: low, medium, high
low noise high noise
256 synapses × 3% ≈ 8 active synapses (on average)
8 active synapses (on average)5 active synapses
# of synapses
256 synapses 12 active synapses
P
K = 256
Responses to Untrained Patterns
1
1 111
1 11
11
111
111 1
111
1
11
1
1 1
0
0
0
0
0
0 0 0
00 0
0
0
00
0
0 0
0
0 0 0
0
0
0 0
0
0
00
0
0
0
# of synapses
P
pre-synaptic activation
K = 256
Responses to Untrained Patterns
1
1 111
1 11
11
111
111 1
111
1
11
10
1 1
0
0
0
0
0 0 0
00 0
0
0
00
0
0 0
0
0 0 0
0
0
0 0
0
0
00
0
0
0
pre-synaptic activation
# of synapses
P
post-synaptic activation
K = 256
Responses to Untrained Patterns
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
K = 256
θR trained
untrained
Responses to Untrained Patterns
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
Responses to Untrained Patterns
K = 256
M = 20,000 dendrites
20,000 dendrites x 0.1% = 20 active dendrites
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
K = 256
Responses to Untrained Patterns
20,000 dendrites x 0.1% = 20 active dendrites
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
M = 20,000 dendrites
K = 256
Responses to Untrained Patterns
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
untrainedresponses
K = 256
Responses to Untrained Patterns
recognition threshold
θR
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
trainedresponses
untrainedresponses
training cost: 35 dendrites
K = 256
Responses to Untrained Patterns
θF
PP
# of synapses
PF = 0.1% (Probability of dendrite firing)
pre-synaptic activation
post-synaptic activation
PF = 0.02% (Probability of dendrite firing)
recognition threshold
θR
trainedresponses
untrainedresponses
K = 256
training cost: 35 dendrites
Responses to Untrained Patterns
recognition threshold
θR
θF
PP
# of synapses
pre-synaptic activation
post-synaptic activation
PF = 0.02% (Probability of dendrite firing)
cost: 18
Responses to Untrained Patterns
K = 256
θF
PP
# of synapses
pre-synaptic activation
post-synaptic activation
PF = 0.02% (Probability of dendrite firing)
θR
K = 256
Responses to Untrained Patterns
θF
PP
# of synapses
pre-synaptic activation
post-synaptic activation
PF = 0.02% (Probability of dendrite firing)
PF = 5x10-7 (Probability of dendrite firing)
θR
PL M=5⋅
5
K = 256
Responses to Untrained Patterns
θLpre
θF
θLpost
PP
# of synapses
pre-synaptic activation
post-synaptic activation
θR
K = 256
Responses to Untrained Patterns
PL M=5⋅
Strengthening (or refreshing) a synapse makes it young; depression targets oldest synapses
Weak synapses
(unordered)“Young”
Strong synapses
“Old”
1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 11 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0
Strengthening (or refreshing) a synapse makes it young; depression targets oldest synapses
1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 11 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0
Weak synapses
(unordered)“Young”
Strong synapses
“Old”
Strengthening (or refreshing) a synapse makes it young; depression targets oldest synapses
1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 11 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0
Weak synapses
(unordered)“Young”
Strong synapses
“Old”
Distribution of synapse ages is determined by a geometric decay process
K = 256
Capacity = 26,500 patterns
Distribution of synapse ages is determined by a geometric decay process
LN
naive calculation 16,500 patterns
???K=256
And now for the capacity curve...
K=256
Does the simple model predict the simulations?
Now Simulation capacity
Analytical capacity
Why are the predicted and actual capacities different?
1. Simulations assume a soft dendritic firing threshold.
2. In simulations, each axon makes multiple synaptic contacts – leads to correlations.
3. In simulations, variance in dendrite decay times leads to premature response failures
Model
Model
Simulation
Simulation
Floor effect leads to reduced capacity for neurons with large dendrites
dendrite usage
θR
untrainedresponses
trainedresponses
dendrite usage
Response distribution of K=1024
capacity
Den
drite
usa
ge
Floor effect leads to reduced capacity for neurons with large dendrites
dendrite usage
Response distribution of K=1024
θR
untrainedresponses
trainedresponses
syna
pse
usag
e
capacity
θF
θLpre
PF = 0.000000125
probabilitya dendrite learns
probabilitya dendrite fires PL = 0.00001
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
θLpre
θF
number of dendrites
PL PF
Probabilityuntrainedresponses
trainedresponses
PF = 0.000000125
PL = 0.00001
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
θLpre
θF
number of dendrites
PL PF
Probabilityuntrainedresponses
trainedresponses
45% false negatives!
θF
θLpre
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
number of dendrites
PL PF
Probabilityuntrainedresponses
Back to 1%error rate!trained
responses
θLpre
θF
30% readout failure
θF
θLpre
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
number of dendrites
PL PF
Probabilityuntrainedresponses
too higherror rate
trainedresponses
θLpre
θF
includesreadoutfailures
θF
θLpre
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
PL PF
Probabilityuntrainedresponses
trainedresponses
θLpre
θF
Area under the tail
K=64
pre-synaptic activation
post-synaptic activation
PL PF
Probabilityuntrainedresponses
trainedresponses
θLpre
θF
20 dendrites trained
θF
θLpre
Short dendrites suffer from (1) the dendrite availability problem, and (2) high readout failure rates, both of which increase the number of dendrites needed to store a pattern.
dendrite usage
dendrite usage
capacity
K=64 K=256
minimum requirement7
minimum requirement7
to increase dendrite availability+4
+9to compensate for readout error
+1to compensate for readout error
20
8
Den
drite
usa
ge (U
D)
K=64
K=256
capacity
dendrite usage
prediction
syna
pse
usag
e
capacity
dendrite usage
actualprediction
syna
pse
usag
e
capacity
dendrite usage
K=64
K=256
θLpre
θLpre
average numberof synapses activated (Kx3%)
post-synaptic activation
pre-synaptic activation
post-synaptic activation
pre-synaptic activation
8
F=0.14
F=0.08
2
prediction
syna
pse
usag
e
capacity
F factor
actualprediction
syna
pse
usag
e
capacity
F factor
syna
pse
usag
e
capacity
US=160, N=K×M =5,120,000
US=160
syna
pse
usag
ecapacity
K=256K=64
Assume constant synapse usage Us
K=256
K=64
syna
pse
usag
ecapacity
Summary of results
syna
pse
usag
e
dendrite usage
Long dendrites are more expensive than short dendrites
Short dendrite suffer from:
Dendrite availability problem
Readout failure
High F factor
capacity
Higher activation density decreases capacity and favors small dendrites
Short dendrites are more sensitive to noise
low noise
high noisemedium noise
Ca
paci
ty (x
1000
)
Correlations lead to increased deviations in dendritic activation
small
medium
high
correlation
linear logarithm
K=1024
Correlations reduce memory capacity
small correlation
medium correlation
high correlation
Duplication avoidance reduce in dendrite summation deviation
P
In dendrite summation
Duplication avoidance rescues the impairment caused by signal correlation
without decorrelation
small correlation
medium correlation
high correlation
Duplication avoidance rescues capacity from the effects of signal correlation
without decorrelation
withdecorrelation
small correlation
medium correlation
high correlation
Branco, T. and K. Staras (2009). "The probability of neurotransmitter release: variability and feedback control at single synapses." Nat Rev Neurosci 10(5): 373-83.
Evidence for duplication avoidance?
What we learned about online learning in a neural context
Conclusion
For an online (sequential, one-shot) recognition memory containing
• a large number of synapses formed onto• 2-layer neurons containing separately thresholded dendritic subunits
Storage capacity is maximized by
• turning up the dendritic firing threshold very high to
keep background firing rates extremely low, which allows for
stored traces to be extremely weak with
very few synapses consumed per pattern so that
each synapse is very rarely used and can grow “old” before it is deleted
and• storing patterns by an LTP-like operation gated by
dual learning thresholds requiring both
1. strong pre-synaptic activation of the dendrite
i.e. having many axons driving the synapses and
2. strong post-synaptic activation of the dendrite
i.e. where many of the activated synapses are already strongand
• protecting strengthened (or refreshed) synapses by a tag that ages so that
• homeostatic depression is limited to the least-recently trained synapses
• choosing a dendritic morphology with dendrites of “medium” size, that is
as short as possible to minimize synapse usage per dendrite but
as long as possible to reduce within-dendrite variability that
1. forces dendrites to have excessively high thresholds, leading to
availability problems, that in turn requires
lowering firing thresholds which
raises background firing rates which
lowers capacity
2. leads to excessively high readout failure rates that again requires
increases storage costs per pattern which
lowering dendritic thresholds to
increase memory trace strength which again
increases storage costs per pattern which
lowers capacity
3. leads to high synapse usage ratio (F) per dendrite producing
unfavorable conditions for geometric decay of age queues, leading to
shorter information survival times in dendrites which
lowers capacity
and
How this will be helpful
We now understand better how properties of dendrites, including• their sizes• their learning and firing thresholds
• their plasticity rules including
the rules and mechanisms governing “LTP”
the rules and mechanisms governing “LTD”
relate to memory capacity.
1. Scientific contribution
2. Practical/translational contribution
Our improved understanding will• Help us to propose experiments to test a variety of predictions that arise from this model
• Help us to interpret the significance of changes to neurons that disrupt memory function in
agingstressneurological disorders
Acknowledgment
Committee membersDr. Bartlett Mel
Dr. Michel Baudry
Dr. Manbir Singh
Dr. Fritz Sommer
Dr. Li Zhang
Lab membersBardia Behabadi and DJ StrouseMonika Jadi, Rishabh Jain, Chaithanya Ramachandra, Yichun Wei
The end
The end
Normalized Pre/Post synaptic summation
€
B(x;K,PA )
B(x × μ;K,PA ) / Max(B(x;K,PA ))
μ = K × PA
€
B(x;KS ,PA )
B(x × μ;KS ,PA ) / Max(B(x;KS ,PA ))
μS = KS × PA
KS = K /2
Simulation capacity
Analytical capacity
Dendrites must cross two thresholds to learn
(DA )
(DB)
Wu & Mel 2009
# of strong active synapses (DB)
Retrieval failure further prefer long dendrites
Theoretical calculation on capacity under different activation densities
Online learning memory is a sequential process
Memory trace saturation
Theoretical analysis
K=2048
K=128
K=32
Second pressure on short dendrites
U≈160
Smaller dendrite size is associated with shorter synapse age queue
depthAssume constant synapse usage Us
K=32
RT=θLpre/K F
K=100
K=50
Now Simulation capacity
Analytical capacity
Short dendrites are more sensitive to noise
low noise
high noise
medium noise