Synaptic homeostasis hypothesis synaptic depression during sleep
Learning and Synaptic P lasticity
description
Transcript of Learning and Synaptic P lasticity
Learning and Synaptic Plasticity
Molecules
Levels of Information Processing in the Nervous System
0.01mm
Synapses1mm
Neurons100mm
Local Networks1mm
Areas / „Maps“ 1cm
Sub-Systems10cm
CNS1m
3
At the dendrite the incomingsignals arrive (incoming currents)
Molekules
Synapses
Neurons
Local Nets
Areas
Systems
CNS
At the soma currentare finally integrated.
At the axon hillock action potentialare generated if the potential crosses the membrane threshold
The axon transmits (transports) theaction potential to distant sites
At the synapses are the outgoing signals transmitted onto the dendrites of the target neurons
Structure of a Neuron:
Receptor ≈ Channel
Vesicle
TransmitterAxon
Dendrite
Schematic Diagram of a Synapse:
Transmitter, Receptors, Vesicles, Channels, etc.synaptic weight: 𝒘
5
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods
6
Different Types/Classes of Learning Unsupervised Learning (non-evaluative feedback)• Trial and Error Learning.• No Error Signal.• No influence from a Teacher, Correlation evaluation only.
Reinforcement Learning (evaluative feedback)• (Classic. & Instrumental) Conditioning, Reward-based Lng.• “Good-Bad” Error Signals.• Teacher defines what is good and what is bad.
Supervised Learning (evaluative error-signal feedback)• Teaching, Coaching, Imitation Learning, Lng. from examples and more.• Rigorous Error Signals.• Direct influence from a teacher/teaching signal.
7
Basic Hebb-Rule: = m ui v m << 1dwi
dtFor Learning: One input, one output.
An unsupervised learning rule:
A supervised learning rule (Delta Rule):
! i ! ! i à ör ! iENo input, No output, one Error Function Derivative,where the error function compares input- with output-examples.
A reinforcement learning rule (TD-learning):
One input, one output, one reward.
wi ! wi + ö[r(t + 1) + í v(t + 1) à v(t)]uà(t)
8
The influence of the type of learning on speed and autonomy of the learner
Correlation based learning: No teacher
Reinforcement learning , indirect influence
Reinforcement learning, direct influence
Supervised Learning, Teacher
Programming
Learning Speed Autonomy
9
Hebbian learning
AB
A
B
t
When an axon of cell A excites cell B and repeatedly or persistently takes part in firing it, some growth processes or metabolic change takes place in one or both cells so that A‘s efficiency ... is increased.
Donald Hebb (1949)
10
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods
You are here !
11
Hebbian Learning
…Basic Hebb-Rule:
…correlates inputs with outputs by the…
= m v u1 m << 1dw1
dt
vu1w1
Vector Notation
Cell Activity: v = w . u
This is a dot product, where w is a weight vector and uthe input vector. Strictly we need to assume that weightchanges are slow, otherwise this turns into a differential eq.
12
= m v u1 m << 1dw1
dtSingle Input
= m v u m << 1dwdtMany Inputs
As v is a single output, it is scalar.
Averaging Inputs= m <v u> m << 1
dwdt
We can just average over all input patterns and approximate the weight change by this. Remember, this assumes that weight changes are slow.
If we replace v with w . u we can write:
= m Q . w where Q = <uu> is the input correlation matrix
dwdt
Note: Hebb yields an instable (always growing) weight vector!
13
Synaptic plasticity evoked artificially
Examples of Long term potentiation (LTP)and long term depression (LTD).
LTP First demonstrated by Bliss and Lomo in 1973. Since then induced in many different ways, usually in slice.
LTD, robustly shown by Dudek and Bear in 1992, in Hippocampal slice.
14
15
16Why is this interesting?
LTP and Learninge.g. Morris Water Maze
rat
platform
Morris et al., 1986
Control Blocked LTP
Tim
e pe
r qua
dran
t (se
c)
12
34
2 3 41 321 4
Learn the position of the platform
Before learning
After learning
Receptor ≈ Channel
Vesicle
TransmitterAxon
Dendrite
Transmitter, Receptors, Vesicles, Channels, etc.synaptic weight: 𝒘
Schematic Diagram of a Synapse:
19
LTP will lead to new synaptic contacts
20
Synaptic Plasticity:Dudek and Bear, 1993
LTD(Long-Term Depression)
LTP(Long-TermPotentiation)
LTD
LTP
10 Hz
22
Conventional LTP = Hebbian Learning
Symmetrical Weight-change curve
Pre
tPre
Post
tPost
Synaptic change %
Pre
tPre
Post
tPost
The temporal order of input and output does not play any role
Spike timing dependent plasticity - STDP
Markram et. al. 1997
+10 ms
-10 ms
Synaptic Plasticity: STDP
Makram et al., 1997Bi and Poo, 2001
Synapse Neuron BNeuron A
u vω
LTP
LTD
25
Pre follows Post:Long-term Depression
Pre
tPre
Post
tPost
Synapticchange %
Spike Timing Dependent Plasticity: Temporal Hebbian Learning
Pre
tPre
Post
tPost
Pre precedes Post:Long-term Potentiation
Acau
sal
Causal(possibly)
Time difference T [ms]
26
= m v u1 m << 1dw1
dtSingle Input
= m v u m << 1dwdtMany Inputs
As v is a single output, it is scalar.
Averaging Inputs= m <v u> m << 1
dwdt
We can just average over all input patterns and approximate the weight change by this. Remember, this assumes that weight changes are slow.
If we replace v with w . u we can write:
= m Q . w where Q = <uu> is the input correlation matrix
dwdt
Note: Hebb yields an instable (always growing) weight vector!
Back to the Math. We had:
= m (v - Q) u m << 1dw
dt
Covariance Rule(s)
Normally firing rates are only positive and plain Hebb would yield only LTP.Hence we introduce a threshold to also get LTD
Output threshold
= m v (u - Q) m << 1dw
dtInput vector threshold
Many times one sets the threshold as the average activity of somereference time period (training period)
Q = <v> or Q = <u> together with v = w . u we get:
= m C . w, where C is the covariance matrix of the input
dw
dt
C = <(u-<u>)(u-<u>)> = <uu> - <u2> = <(u-<u>)u>
v < Q: homosynaptic depression
u < Q: heterosynaptic depression
The covariance rule can produce LTD without (!) post-synaptic input.This is biologically unrealistic and the BCM rule (Bienenstock, Cooper,Munro) takes care of this.
BCM- Rule
= m vu (v - Q) m << 1dw
dt
Experiment BCM-Rule
v
dw
u
post
pre ≠Dudek and Bear, 1992
The covariance rule can produce LTD without (!) post-synaptic input.This is biologically unrealistic and the BCM rule (Bienenstock, Cooper,Munro) takes care of this.
BCM- Rule
= m vu (v - Q) m << 1dw
dt
As such this rule is again unstable, but BCM introduces a sliding threshold
= n (v2 - Q) n < 1dQ
dt
Note the rate of threshold change n should be faster than then weight
changes (m), but slower than the presentation of the individual inputpatterns. This way the weight growth will be over-dampened relative to the (weight – induced) activity increase.
open: control conditionfilled: light-deprived
less input leads to shift of threshold to enable more LTP
BCM is just one type of (implicit) weight normalization.
Kirkwood et al., 1996
31
Evidence for weight normalization:Reduced weight increase as soon as weights are already big(Bi and Poo, 1998, J. Neurosci.)
Problem: Hebbian Learning can lead to unlimited weight growth.
Solution: Weight normalizationa) subtractive (subtract the mean change of all weights from each individual weight).b) multiplicative (mult. each weight by a gradually decreasing factor).
32
Examples of Applications • Kohonen (1984). Speech recognition - a map of
phonemes in the Finish language• Goodhill (1993) proposed a model for the
development of retinotopy and ocular dominance, based on Kohonen Maps (SOM)
• Angeliol et al (1988) – travelling salesman problem (an optimization problem)
• Kohonen (1990) – learning vector quantization (pattern classification problem)
• Ritter & Kohonen (1989) – semantic maps
OD ORI
Program
33
Differential Hebbian Learning of SequencesLearning to act in response tosequences of sensor events
34
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods
You are here !
35
I. Pawlow
History of the Concept of TemporallyAsymmetrical Learning: Classical Conditioning
36
37
I. Pawlow
History of the Concept of TemporallyAsymmetrical Learning: Classical Conditioning
Correlating two stimuli which are shifted with respect to each other in time.
Pavlov’s Dog: “Bell comes earlier than Food”
This requires to remember the stimuli in the system.
Eligibility Trace: A synapse remains “eligible” for modification for some time after it was active (Hull 1938, then a still abstract concept).
38
Sw0 = 1
w1
Unconditioned Stimulus (Food)
Conditioned Stimulus (Bell)
Response
S
X
Dw1+Stimulus Trace E
The first stimulus needs to be “remembered” in the system
Classical Conditioning: Eligibility Traces
39
I. Pawlow
History of the Concept of TemporallyAsymmetrical Learning: Classical Conditioning
Eligibility Traces
Note: There are vastly different time-scales for (Pavlov’s) behavioural experiments:
Typically up to 4 seconds
as compared to STDP at neurons:
Typically 40-60 milliseconds (max.)
40
Defining the TraceIn general there are many ways to do this, but usually one chooses a trace that looks biologically realistic and allows for some analytical calculations, too.
EPSP-like functions:a-function:
Double exp.:
This one is most easy to handle analytically and, thus, often used.
DampenedSine wave:
Shows an oscillation.
h(t) =n
0 t<0hk(t) tõ 0
h(t) = teà atk
h(t) = b1sin(bt) eà at
k
h(t) = î1(eà at à eà bt)k
41
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods
Mathematical formulation of learning rules is
similar but time-scales are much different.
42
S
wEarly: “Bell”
Late: “Food”
x
)( )( )( tytutdtd
ii mw
Differential Hebb Learning Rule
Xi
X0
Simpler Notationx = Inputu = Traced Input
V
V’(t)
ui
u0
43
Convolution used to define the traced input,
Correlation used to calculate weight growth.
)()()()()()()( xfxgxgxfduuxgufxh
u
)()()()()()()( xgxfxfxgduxugufxh
w
44
Produces asymmetric weight change curve(if the filters h produce unimodal „humps“)
)(' )( )( tvtutdtd
ii mw
Derivative of the Output
Filtered Input
)( )()( tuttv iiw
Output
Dw
T
Differential Hebbian Learning
45
Conventional LTP
Symmetrical Weight-change curve
Pre
tPre
Post
tPost
Synaptic change %
Pre
tPre
Post
tPost
The temporal order of input and output does not play any role
46
Produces asymmetric weight change curve(if the filters h produce unimodal „humps“)
)(' )( )( tvtutdtd
ii mw
Derivative of the Output
Filtered Input
)( )()( tuttv iiw
Output
Dw
T
Differential Hebbian Learning
47Weight-change curve
(Bi&Poo, 2001)
T=tPost - tPrems
Pre follows Post:Long-term Depression
Pre
tPre
Post
tPost
Synaptic change %Pre
tPre
Post
tPost
Pre precedes Post:Long-term Potentiation
Spike-timing-dependent plasticity(STDP): Some vague shape similarity
48
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods
You are here !
49
PlasticSynapse
NMDA/AMPA
Postsynaptic:Source of Depolarization
The biophysical equivalent of Hebb’s postulate
Presynaptic Signal(Glu)
Pre-Post Correlation,but why is this needed?
50
i n
o u t
i n
o u t
Plasticity is mainly mediated by so calledN-methyl-D-Aspartate (NMDA) channels.
These channels respond to Glutamate as their transmitter andthey are voltage depended:
51
Biophysical Model: Structure
x NMDA synapse
v
Hence NMDA-synapses (channels) do require a (hebbian) correlation between pre and post-synaptic activity!
Source of depolarization:
1) Any other drive (AMPA or NMDA)
2) Back-propagating spike
52
Local Events at the Synapse
SLocal
Current sources “under” the synapse:• Synaptic current
Isynaptic
SGlobalIBP
• Influence of a Back-propagating spike
• Currents from all parts of the dendritic tree
IDendritic
u1
x1
v
54
S
w
Pre-syn. Spike
BP- or D-Spike
* 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2 4 6 8 10
V*h
gNMDA
0 40 80 t [ms]
g [nS ]NM DA
0.1
On „Eligibility Traces“
Membrane potential:
Weight Synaptic input
Depolarization source
deprest
iii
ii IR
tVVVEttVdtdC
D )())((g )()( ww
w1
w0
X
v
v’
ISO-Learning
h
hS
x
x0
1
55
• Dendritic compartment
• Plastic synapse with NMDA channels Source of Ca2+ influx and coincidence detector
PlasticSynapse NMDA/AMPA
depi
ii IVEt~dtdV
))((g
NMDA/AMPAgBP spike
Source of Depolarization
Dendritic spike
• Source of depolarization: 1. Back-propagating spike 2. Local dendritic spike
Model structure
56
Plasticity Rule(Differential Hebb)
NMDA synapse -Plastic synapse
depi
ii IVEtdtdV
))((g ~
NMDA/AMPAg
NMDA/AMPA
Source of depolarization
Instantenous weight change:
)(' )( )( tFtctdtd
Nmw
Presynaptic influence Glutamate effect on NMDA channels
Postsynaptic influence
57
0 40 80 t [ms]
g [nS ]NM DA
0.1
Normalized NMDA conductance:
NMDA channels are instrumental for LTP and LTD induction (Malenka and Nicoll, 1999; Dudek and Bear ,1992)
V
tt
N eMgeec
][1 2
// 21
Pre-synaptic influence
NMDA synapse -Plastic synapse
depi
ii IVEtdtdV
))((g ~
NMDA/AMPAg
NMDA/AMPA
Source of depolarization
)(' )( )( tFtctdtd
Nmw
58
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
0 10
0
-40
-60
-20
20V [m V]
20 t [ms]
0 10
0
-40
-60
-20
20V [mV]
20 t [m s]
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
Dendriticspikes
Back-propagating spikes
(Larkum et al., 2001
Golding et al, 2002
Häusser and Mel, 2003)
(Stuart et al., 1997)
Depolarizing potentials in the dendritic tree
59
NMDA synapse -Plastic synapse
depi
ii IVEtdtdV
))((g ~
NMDA/AMPAg
NMDA/AMPA
Source of depolarization
Postsyn. Influence
)(' )( )( tFtctdtd
Nmw
For F we use a low-pass filtered („slow“) version of a back-propagating or a dendritic spike.
60
0 10
0
-40
-60
-20
20V [mV]
20 t [m s]
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
0 50 150 t [ms]100
0
-40
-60
-20
V [mV]
0 5 0 1 5 0 t [ m s ]1 0 0
0
- 4 0
- 6 0
- 2 0
V [ m V ]
0 20 80 t [ms]40 60
0
-40
-60
-20
V [mV]
0 20 80 t [ms]40 60
0
-40
-60
-20
V [m V]
0 10
0
-40
-60
-20
20V [m V]
20 t [ms]
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
BP and D-Spikes
61
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
0 10
0
-40
-60
-20
20V [m V]
20 t [m s]
0-20 40 T [ms]-40 20
-0.01
-0.03
-0.01
0.01Dw
0-20 40 T [ms]-40 20
-0.01
-0.03
-0.01
0.01Dw
Back-propagating spike
Weight change curve
T
NMDAr activation
Back-propagating spike
T=tPost – tPre
Weight Change CurvesSource of Depolarization: Back-Propagating Spikes
62
PlasticSynapse
NMDA/AMPA
Postsynaptic:Source of Depolarization
The biophysical equivalent of Hebb’s PRE-POST CORRELATION postulate:
THINGS TO REMEMBER
Presynaptic Signal(Glu)
Possible sources are: BP-SpikeDendritic SpikeLocal Depolarization
Slow-Acting NMDA
Signal as presynatic
influence
63
One word about
Supervised Learning
64
Machine Learning Classical Conditioning Synaptic Plasticity
Dynamic Prog.(Bellman Eq.)
REINFORCEMENT LEARNING UN-SUPERVISED LEARNINGexample based correlation based
d-Rule
Monte CarloControl
Q-Learning
TD( )often =0
ll
TD(1) TD(0)
Rescorla/Wagner
Neur.TD-Models(“Critic”)
Neur.TD-formalism
DifferentialHebb-Rule
(”fast”)
STDP-Modelsbiophysical & network
EVALUATIVE FEEDBACK (Rewards)
NON-EVALUATIVE FEEDBACK (Correlations)
SARSACorrelation
based Control(non-evaluative)
ISO-Learning
ISO-Modelof STDP
Actor/Critictechnical & Basal Gangl.
Eligibility Traces
Hebb-Rule
DifferentialHebb-Rule
(”slow”)
supervised L.
Anticipatory Control of Actions and Prediction of Values Correlation of Signals
=
=
=
Neuronal Reward Systems(Basal Ganglia)
Biophys. of Syn. PlasticityDopamine Glutamate
STDP
LTP(LTD=anti)
ISO-Control
Overview over different methods – Supervised LearningYo
u are h
ere !
And many more
65
Supervised learningmethods are mostlynon-neuronal andwill therefore not
be discussedhere.
66
So Far:
• Open Loop Learning
All slides so far !
67
CLOSED LOOP LEARNING
• Learning to Act (to produce appropriate behavior)
• Instrumental (Operant) Conditioning
All slides to come now !
68
This is an open-loopsystem
Sensor 2
conditionedInput
Bell Food Salivation
Pavlov, 1927
Temporal Sequence
This is an Open Loop System !
69
AdaptableNeuron
Env.
Closed loop
Sensing Behaving