Information Theory and Neuroscience I: Optimal Storage in ...

Information Theory and Neuroscience I:Optimal Storage in Noisy Synapses

John Z. Sun and Da WangMassachusetts Institute of Technology

October 14, 2009

Information Theory and Neuroscience I: Optimal Storage in Noisy Synapses

Outline

An Introduction to Neuroscience

Information Theory

Questions about Synapses

Theoretical Framework

InvestigationsOptimal Storage Capacity Per Unit VolumeOptimal Distribution of Synaptic WeightsSynaptic Cost Function from Its Weight Distribution

Summary and Remarks

2 / 35


The Brain

• Controls human behavior• Is an extremely powerful computer, with “switches” being

neurons, connected by “wires” called synapses• Humans have tens of billions of neurons• Has fan-in and fan-out on order of 104

3 / 35


Neurons

4 / 35


Communication through Synapses

• Neurons have different objectivesand behaviors

• Propagate information to otherparts of the body

• Store memory• Filter stimuli

5 / 35



• Modeled in first-orderapproximation as classicalintegrate and fire (CIF)

• Importance of each synapse isquantified in synaptic weight

• Synaptic weight (or efficacy) isdefined as the average EPSPamplitude over trials

• Experiments show thatincreasing EPSP increasesSNR

• Experiments show that synapticweight is proportional tosynaptic volume

6 / 35



• Neurons send and receive pulses to communicate• These are called spike trains or action potentials• Several ideas on how information is sent

• Temporal codes• Rate codes• Spatial-temporal codes

• Usually approximated as digital, slotted, randomtransmissions

7 / 35


Information Theory

INFORMATION SOURCE TRANSMITTER

MESSAGE

_[ S IGNAL 1 RsEFGEi K D [

RECEIVER DESTINATION

H MESSAGE

NOISE SOURCE

Fig. 1 --Schematic diagram of a general communication system.

a decimal digit is about 3 ½ bits. A digit wheel on a desk computing machine has ten stable positions and therefore has a storage capacity of one decimal digit. In analytical work where integration and differentiation are involved the base e is sometimes useful. The resulting units of information will be called natural units. Change from the base a to base b merely requires multiplication by log b a.

By a communication system we will mean a system of the type indicated schematically in Fig. 1. It consists of essentially five parts:

. An information source which produces a message or sequence of messages to be communicated to the receiving terminal. The message may be of various types: (a) A sequence of letters as in a telegraph of teletype system; (b) A single function of time f ( t ) as in radio or telephony; (c) A function of time and other variables as in black and white television - - here the message may be thought of as a function f ( x , y , t) of two space coordinates and time, the light intensity at point (x,y) and time t on a pickup tube plate; (d) Two or more functions of time, say f ( t ) , g(t), h(t) - - this is the case in "three-dimensional" sound transmission or if the system is intended to service several individual channels in multiplex; (e) Several functions of several variables - - in color television the message consists of three functions f ( x , y , t), g(x, y, t), h(x, y, t) defined in a three-dimensional continuum - - we may also think of these three functions as components of a vector field defined in the region - - similarly, several black and white television sources would produce "messages" consisting of a number of functions of three variables; (f) Various combinations also occur, for example in television with an associated audio channel.

. A transmitter which operates on the message in some way to produce a signal suitable for transmission over the channel. In telephony this operation consists merely of changing sound pressure into a proportional electrical current. In telegraphy we have an encoding operation which produces a sequence of dots, dashes and spaces on the channel corresponding to the message. In a multiplex PCM system the different speech functions must be sampled, compressed, quantized and encoded, and finally interleaved properly to construct the signal. Vocoder systems, television and frequency modulation are other examples of complex operations applied to the message to obtain the signal.

3. The channel is merely the medium used to transmit the signal from transmitter to receiver. It may be a pair of wires, a coaxial cable, a band of radio frequencies, a beam of light, etc.

4. The receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal.

5. The destination is the person (or thing) for whom the message is intended.

We wish to consider certain general problems involving communication systems. To do this it is first nec- essary to represent the various elements involved as mathematical entities, suitably idealized from their physical counterparts. We may roughly classify communication systems into three main categories: discrete, continuous and mixed. By a discrete system we will mean one in which both the message and the signal are a sequence of discrete symbols. A typical case is telegraphy where the message is a sequence of letters and the signal a sequence of dots, dashes and spaces. A continuous system is one in which the message and signal are both treated

Mobile Computing and Communications Review, Volume 5, Number ;!

[Shannon (1948)]

• Deals with asymptotic performance of information transfer• Elegant theory for probabilistic sources and channels• Introduced concepts of entropy, capacity and mutual

information

8 / 35


Information Theory

• Assume input X and output Y to the channel• The channel is then described by P (Y |X)• Want to find P (X) to maximize C = I(X;Y )• Extensions include adding cost constraint b(x) on input• In general, very difficult optimization problem

9 / 35


The Theory of Everything?

“Information theory has, in the last few years, becomesomething of a scientific bandwagon... Although this wave ofpopularity is certainly pleasant and exciting for those of usworking in the field, it carries at the same time an element ofdanger.” [Shannon (1956)]

“There is a constructive alternative for the author of this paper.If he is willing to give up larceny for a life of honest toil, he canfind a competent psychologist and spend several years atintensive mutual education, leading to productive jointresearch.” [Elias (1958)]

10 / 35


Information Theory and Neuroscience

• Capacity for specific neuronal models• [MacKay & McCulloh (1952)], [Stein, French & Holden (1972)],

[Eckhorn & Pöpel (1975)]

• Using mutual information for sensory coding• [Lewicki (2002)], [Smith & Lewicki (2006)], [Chechik et al

(2006)]

• Finding optimal P (X) given cost constraints• [Varshney, Sjöström & Chklovskii (2006)], [Berger & Levy (2009)]

• Surveys on IT in Neuroscience• [Rieke et al (1996)], [Borst & Theunissen (1999)], [Johnson

(2003)], [HST.722J]

11 / 35


Questions about Synapses

Varshney et. al. attempt to answer the following questionsabout synapses:• Why are typical central synapses noisy?

• Arriving spikes often fail to evoke EPSP.• EPSP amplitude varies from trial to trial.• Synaptic unreliability detrimental for transmission.

• Why does the distribution of synaptic strengths display anotable tail of strong connections?

• Most synaptic efficacies are weak (mean EPSP < 1mV).• But a notable tail of stronger synaptic strengths (mean

EPSP > 20mV) are observed.• Why is synaptic connectivity sparse?

• Connectivity sparse at both local and global levels.• Increased connectivity would enhance system functionality.

12 / 35


Theoretical Framework

Develop a theoretical framework based on the roles ofsynapses as mechanism of information storage.

Optimization Approach

• Evolution by natural selection favors genotypes of highfitness.

• Qualities that affect fitness tend to improve throughevolution.

• Mathematical analysis of fitness optimization seemsreasonable as an approach to understand organisms.

13 / 35


Memory as a Channel

Memory channel:

Input distribution P (X):• distribution of synaptic

weights across synapses.• X = 0: zero-weight

connections.Channel P (Y |X) determinedby the noises:• storage noise• in situ noise• read-out noise

Remarks

• Each synapse is a channel usage.• Channel usage are separated in space, rather than time.

14 / 35


Optimization Formulation

Maximizing information storage of synapses under resourceconstraints.Storage capacity

C = maxPX

I(X;Y )

where

SNR = E

[X2

Noise

]Constraints• Minimize volume usage.

• Cost function b(x) is determined by synapse volume V .• Known: synaptic weight X ∝ volume V

• No specific assumptions regarding the neural network.• No specific assumptions on error control coding.

15 / 35


Synaptic Weight and Synaptic Volume

Experimental studies show synaptic weight (efficacy) Xcorrelate with synapse volume V (X) positively:

V (X)VN

=(X

AN

)αwhere• AN : noise amplitude.• V (X): volume of synapse with weight X.

• V (X = x): may write as Vx.

• VN : synapse normalization constant volume (volume ofsynapse with SNR = 1).

• α: undecided, maybe α ≤ 1.

16 / 35


Cost Function

• In addition to the volume of synapses V (X) itself, there is acost of accessory volume V0 > 0 that supports a synapse.

• There are no free symbols.

• Cost function:b(X) = V (X) + V0

Average (expected) cost:

B , E[b(X)] = E[V (X)] + V0

Used 〈·〉 for average/expectation:

〈V 〉 = E[V (X)]〈b〉 = B = 〈V 〉+ V0

17 / 35


Optimal Storage Capacity Per Unit Volume

Given the mathematical model of neuron memory:• Perform optimization to generate predictions and

explanations.• Optimize information storage capacity per unit volume, i.e.,

capacity per unit cost.• Since there is no zero-cost symbol (V0 > 0), cannot apply

the analytical results in [Verdu, 1990 & 2002].

18 / 35


Special Case: AWGN

Let α = 2:

SNR = E

[X2

A2N

]=〈V 〉VN

Let the channel be AWGN:

CAWGN =12

log (1 + SNR)

Capacity per unit volume:

C =C

B=

C

〈V 〉+ V0=

12(〈V 〉+ V0)

log(

1 +〈V 〉VN

)Optimizing 〈V 〉 depends onaccessory volume V0.

When V0 � VN , capacity isoptimized by 〈V 〉 =

√V0VN .

Exact dependence→

19 / 35


Determining the optimal B

• V0 is small, so 〈V 〉 should be such that SNR < 1.• V0 is not infinitesimal, so 〈V 〉 should be such that SNR is

not infinitesimal.• Similar conclusion holds for general channels, by concavity

of C(B).

Volume cost B

Info

rmati

on s

tora

ge c

apaci

ty C

V0 V ∗

20 / 35


Answer to the First Question

Optimization Principle 1To optimize information storage capacity per unit volume ofneural tissue, synapses should be small and noisy on average.

Several experimental studies show similar results:• Average SNR is less than one, but not infinitesimal.

Noisy synapses maximize information storage capacity!

21 / 35


Establishing Synaptic Efficacy Distribution

• Want a full distributional characterization of the synapticweights: the capacity achieving input distribution.

• AWGN assumption is not precise.• non-negativity• It is unlikely that α = 2.

• A full input-output characterization of the synaptic memorychannel is not available.

• Look at approximations!

22 / 35


ε-Capacity Cost Approximation

To yield a exactly solvable optimization problem:• Assume noise is approximately bounded (by ε).• Partitioning input space so that X = {0, 1, · · · , }.

Convert the memory channel to a zero-error, discrete inputchannel. [Kolmogrov & Tihomirov, 1959; Root, 1968]

23 / 35


Maximum Entropy Method

Channel is noiseless⇒ the input distribution with maximumentropy achieves capacity.Need to maximize entropy H(X) per average volumeV = 〈V 〉+ V0:

maximize Isynapse = H(X)

sub. to∑x

P (x)V (x) = V⇒ Popt(x) = exp(−βV (x))

β satisfies∑

x Popt(x) = 1.

The storage capacity per unit volume:

Isynapse

V=β∑

x V (x) exp (−βV (x))∑x V (x) exp (−βV (x))

= β

24 / 35


Equidistant Assumption & Filling Fraction

Motivated by experimentalobservations, assume synapticstate volume is distributedequidistantly:

V (i) = V0+2iVN , i = 0, 1, 2, · · ·

⇒

〈V 〉i>0 = 2VN exp(βV0)f = 1− exp(−βV0)

where f is the fraction ofsynapses in states X > 0.

Optimal achieved when V0 is assmall as possible:⇒ f � 1⇒ 〈V 〉i>0 ≈ 2VN

25 / 35


Sparse Synaptic Connectivity

Optimization Principle 2To optimize information storage capacity per unit volume, thefilling fraction should be small. Small filling fraction isequivalent to sparse synaptic connectivity.

• The filling fraction is between 0.1 and 0.3 for various brainregions in various mammals (Stepanyants et al., 2002).

• Using experimentally determined values of V0, thepredicted and actual filling fractions are quite close to eachother.

26 / 35


Distribution of Volume and Efficacy

• The capacity achievingvolume distribution:

P (x) = exp(−βV (x))

• Combining with relationshipbetween volume cost andefficacy yields the capacityachieving input distribution.

P (x) = exp(−βVN

(x

AN

)α)• Stretched exponential with the

stretching factor determinedby α.

Supplementary Figure 4. Distribution of synaptic weights sorted into bins whose width is given by two standard deviations for the corresponding EPSP amplitude. Blue squares: fraction of neuron pairs belonging to a bin centered on that synaptic weight. Black line: stretched exponential fit (i.e. power law in the log domain) with exponent 0.49. Unconnected pairs of neurons as well as very weak connections (<0.1 mV) are pooled in the first bin, which corresponds to zero-weight synapses (absent and silent synapses). For this reason, the gap between the zero-weight synapses and the peak of the distribution does not appear in this graph (cf. Supplementary Figure 1 and Section VI; also see Figure 5 in Song et al., 2005).

Theoretical fit ofexperimental data:

α = 0.79

27 / 35


Presence of Strong Synapses

Optimization Principle 3To optimize information storage capacity per unit volume,although there will be many absent synapses and numeroussmall synapses, there will also be some large synapses, withsynaptic efficacy distribution like a stretched exponential.

28 / 35


Inferring the Input Cost Function

– Previous result was based on the ε-capacity costapproximation.– Alternative: inverse information theory approach.– What is the cost function b(·) for which the measured P (X)and P (Y |X) are optimal?

C(B) = maxP (X):E[b(X)]≤B

I(P (X);P (Y |X))

Choose cost function b(·) so a fixed P (X) is the optimalsolution.⇒ Under proper conditions [Gastpar, 2003]:

b(x) = vD(PY |X(·|x)‖PY (·)) + v0

where v > 0 and v0 is an arbitrary constant.

29 / 35


Inverse Information Theory Approach

AWGN channel:

V (x) ∼ x2

Discrete noiseless channel:

V (x) ∼ − logP (x)(∼: equality up to affine transformation.)

– Obtain P (Y |x) and P (Y )from experiment data.– Calculate V (x) accordingly.– Power law fit of of V (x) gives:

α = 0.77

Agrees with the previousapproach!

30 / 35


Remarks on Modeling

• Optimality in biological systems• Does evolution find optimal design through optimization?• Causality versus correlation

• Caveats in modeling systems• Follow scientific method• Create testable hypotheses and rigorous experiments• Can only be validated, never proven (falsifiability)

31 / 35


Discussion

“The strength of the information theory approach is that itprovides an upper bound on information storage... withoutconsidering how memory is stored, retrieved, coded, ordecoded. Regardless of whether error-correcting codes areused or not, the capacity achieving input distribution must beused for optimal performance.” [Varshney, Sjöström & Chklovskii(2006)]

32 / 35


Summary

• Developed a theory of information storage in the brainusing an information theoretic optimization

• Optimal distribution of synaptic weights• Volume and structure from its weight distribution

• No restrictive model assumptions

• Validates (unintuitive) observations from experiments• Predicts relationships on uninvestigated aspects of

synapses• α as power-law relationship between synapse volume and

efficacy.• β as parameter in synapse volume distribution.

33 / 35


Issues

Memory channel:Memory as a Channel:• Is each synapse a channel

use?• How does this relate to

long-term memory

34 / 35


Issues

The model holds only when memory is the main function of thesynapse• For information transmission, synapses may be large and

reliable• This is our discussion for next time...

T. Berger & W. Levy, “Information Transfer by Energy-efficientNeurons,” 2009.

35 / 35

Information Theory and Neuroscience I: Optimal Storage in ...

Documents

Transcript of Information Theory and Neuroscience I: Optimal Storage in ...