An exactly solvable maximum entropy model Peter Latham Gatsby Computational Neuroscience Unit UCL...

An exactly solvable maximum entropy model

Peter LathamGatsby Computational Neuroscience Unit

CNSJuly 20, 2006

s r1, r2, ..., rn

The neural coding problem

s r1, r2, ..., rn

P(s|r1, r2, ..., rn)

s r1, r2, ..., rn

P(r1, r2, ..., rn|s) P(s|r1, r2, ..., rn)

s r1, r2, ..., rn

P(r1, r2, ..., rn|s) P(s|r1, r2, ..., rn)

P(r1, r2, ..., rn|s) P(s)

P(r1, r2, ..., rn)

0 10 20

response (r)

response:one neuron, spike count, 300 ms bins.

decent histogram: ~20 responses, ~200 trials/stimulus.

1 , r 2

200 trials is just not enough

response:two neurons, spike count, 300 ms bins.

decent histogram: ~202=400 responses, ~4000 trials/stimulus.

more realistic case ...

P(r1, r2, ..., r10|s) =???

10-D histogram, ~1013 responses.

time to collect:20,000,000 years/stimulus.

Clearly, an approximate approach is needed.

There are several possibilities:

1. Assume independence: p(r|s) = p(r1|s)p(r2|s) …

2. Parametric models.

2a. Point process models.2b. Gaussian approximation (for rates).2c. Maximum entropy models.

Questions:

1. Are maximum entropy models useful for neural data?

2. How do we assess model quality?

3. How tractable are these models?

I’m not sure.

Not the way you might think.

Not very.

The idea behind maximum entropy models:

1. Measure, from data, some aspect of a probability distribution.

2. Find the maximum entropy distribution consistent with that measurement.

An example (in 1-D, with no dependence on s).

1. estimate the mean response from data,

r = ∑ r(k)

2. find the maximum entropy distribution consistent with this,

[ -∑r' p(r') log p(r') - λ0∑r' p(r') - λ∑r' r'p(r') ] = 0∂

∂p(r)

entropy normalization (1) mean (r)

[ -∑r' p(r') log p(r') - λ0∑r' p(r') - λ∑r' r'p(r') ] = 0∂

∂p(r)

-1-log p(r) -λ0 -λr

-1 - log p(r) - λ0 - λr = 0

=> p(r) = exp[-1 - λ0 - λr]

To find λ0 and λ,

p(r) =

Z = ∑r exp(-λr) => ∑r p(r) = 1

r = determines λ

exp(-λr)

∑r r exp(-λr)

p(r) =exp(-λr - λ1r2)

p(r1, r2) =exp(-λ1r1 - λ2r2 - λ11r1

2 - λ12r1r2 - λ22r22)

An aside:

Maximum entropy => Maximum likelihood

just another parametric model;it’s just one that lives in theexponential family.

Assessing goodness of fit: KL distance.

D(p(r)||p(r|λ)) = ∑r p(r) log [p(r)/p(r|λ)]

= ∑r p(r) log p(r) - ∑r p(r) log p(r|λ)

= -H(r) - ∑r p(r) log p(r|λ)

entropy

Our original example:

p(r|λ) =

- ∑r p(r) log p(r|λ)

exp(-λr)

Our original example:

p(r|λ) =

- ∑r p(r) log p(r|λ) = λ∑r p(r) r + log Z

= λ∑r p(r|λ) r + log Z

= - ∑r p(r|λ) log p(r|λ)

= H(r|λ)

exp(-λr)

= -H(r) - ∑r p(r) log p(r|λ)

entropy

= -H(r) + H(r|λ)

entropy

entropy under the model

D(p(r)||p(r|λ)) = model entropy - true entropy

We have a problem:

Although we might be able to compute the modelentropy, there’s no way in hell we can compute thetrue entropy.

number of neurons

“Solution”: an exactly solvable model.

In the large N limit,

we can compute the true entropyand the model entropy.

N neurons

N binary neurons

0110010111

1111011110

0001100100

ri = response of neuron i = {-1, 1}

0 spikesin a bin

1 or more spikesin a bin

ri = response of neuron i = {-1, 1}

0110010111

r = (-1, 1, 1, -1, -1, 1, -1, 1, 1, 1)

Exactly solvable model (suppressing dependenceon stimulus)

p(r) = ∑θ p(θ) ∏i p(ri|θ)

Exactly solvable model (suppressing dependenceon stimulus)

p(r,θ) = ∑θ p(θ) ∏i p(ri|θ)

Computing the true entropy, H(r)

H(r) – H(r|θ) = I(r;θ)

mutual information

H(r) – H(r|θ) = I(r;θ) ≤ H(θ)

H(r) = H(r|θ) + I(r;θ) ≤ H(r|θ) + H(θ)

H(r) = H(r|θ) + I(r;θ) ≥ H(r|θ)

mutual information

entropy of p(θ)

H(r) – H(r|θ) = I(r;θ) ≤ H(θ)

H(r) = H(r|θ) + I(r;θ) ≤ H(r|θ) + H(θ)

H(r) = H(r|θ) + I(r;θ) ≥ H(r|θ)

H(r|θ) ≤ H(r) ≤ H(r|θ) + H(θ)

Why is this useful?

H(r|θ) = ∑θ p(θ) H(∏i p(ri|θ))

= ∑θ p(θ) ∑i H1(ri|θ)

H1(r|θ) = - ∑r p(r|θ) log p(r|θ)

Why is this useful?

= ∑θ p(θ) ∑i H1(ri|θ)

= N ∑θ p(θ) H1(r|θ)

Why is this useful?

= ∑θ p(θ) ∑i H1(ri|θ)

= N ∑θ p(θ) H1(r|θ)

only two terms in this sum!!!

H(r|θ) ≤ H(r) ≤ H(r|θ) + H(θ)

order(N) and easyto compute

order(1)

In the large N limit,

H(r) ≈ H(r|θ)

Two maximum entropy models:

1. p1(r|h'), which captures first moments:

∑r ri p(r) = ∑r ri p(r|h')

2. p2(r|h, J), which captures first and second moments:

∑r ri p(r) = ∑r ri p2(r|h, J)

∑r ri rj p(r) = ∑r ri rj p2(r|h, J)

p1(r|h') =

p2(r|h, J) =

p(r) = ∑θ p(θ) ∏i p(ri|θ)

exp(h' ∑i ri)

exp(h∑i ri + (J/2N)∑ij ri rj)

p1(r|h') = => all neurons have the same mean

ri ≡ ρ, independent of i.

ρ completely specifies p1(r|h').

exp(h' ∑i ri)

p(r) = ∑θ p(θ) ∏i p(ri|θ) => conditioned on θ, all neurons have the same mean.

ri(θ) ≡ ∑ ri p(ri|θ) ≡ ρ(θ).

ρ(θ) and p(θ) completely specifies p(r).

p1(r|h')

ρ(θj)

p(θj)

p2(r|h, J)

-ρ2 ρ2

Conclusion #1:

The “pairwise” maximum entropy distribution(p2) does not do a very good job matching thetrue distribution.

Whether or not this is true for more complexdistributions is not known.

A simple case: p(θ) consists of two terms

p1(r|h')

ρ(θ1) ρ(θ2)

p(θ1)

p(θ2)

p2(r|h, J)

-ρ2 ρ2

p1(r|h')

ρ(θ1) ρ(θ2)

p(θ1)

p(θ2)

p2(r|h, J)

-ρ2 ρ2

Three parameters: ρ, p(θ2), ρ2

firing rate bin size

In terms of more intuitive parameters:

ρ = (+1)×ντ + (-1)×(1-ντ) = 2ντ – 1

ρ22 - ρ2 ≡ δ2 = <rirj> - <ri><rj>, i ≠ j

bin size

firing rate bin size

In terms of more intuitive parameters:

ρ = (+1)×ντ + (-1)×(1-ντ) = 2ντ – 1

ρ22 - ρ2 ≡ δ2 = <rirj> - <ri><rj>, i ≠ j

Model parameters: ντ, δ, p(θ2).

Goodness of fit:

D(p(r)||p1(r|h')) = H1 – H

D(p(r)||p2(r|h, J)) = H2 – H

The picture:

0 H1H2H

What’s a good cost function?

For the independent distribution, p1(r|h'):

0 H1H2H

What’s a good cost function?

For the independent distribution, p1(r|h'):

For the pairwise distribution p2(r|h, J):

0 H1H2H

p(θ2)0.0 0.2 0.4 0.6 0.8 1.0

1τ = 20 ms, ν = 25 Hz, δ = 0.1

p(θ2)0.0 0.2 0.4 0.6 0.8 1.0

τ = 20 ms, ν = 25 Hz, δ = 0.1

H2 ≈ H;both are ≈ H1

≈ 1H1-H2

0 H1H2H

p(θ2)0.0 0.2 0.4 0.6 0.8 1.0

1τ = 20 ms, ν = 2 Hz, δ = 0.05

HH2≈ 0.2

0 H1H2H

p(θ2)0.0 0.2 0.4 0.6 0.8 1.0

1τ = 20 ms, ν = 2 Hz, δ = 0.1

p(θ2)0.0 0.2 0.4 0.6 0.8 1.0

1τ = 20 ms, ν = 2 Hz, δ = 0.2

A better approach to determining goodness of fit

What we’re computing is

p(r|s,λm)

model m

What we’re computing is

p(r|s,λm)

=> p(s|r,λm) = p(r|s,λm) p(s) / normalization

What we should be comparing are posteriors:

p(s|r,λ1) and p(s|r,λ2)

It doesn’t make sense to spend a huge amount of timeand effort finding the ultimate model for p(r|s,λ) ifthat’s not going to improve p(s|r,λ).

1. Maximum entropy => Maximum Likelihood.

2. For at least one maxent model, pairwise correlations don’t match the true distribution very well.

3. One needs to be very careful about assessing models: compare posteriors!!!

4. For binary neurons, the pairwise maxent model is intractable.

5. Wherever possible, use point process models. After all, spike timing is irrelevant in the brain.

Conclusions

An exactly solvable maximum entropy model Peter Latham Gatsby Computational Neuroscience Unit UCL...

Documents

Transcript of An exactly solvable maximum entropy model Peter Latham Gatsby Computational Neuroscience Unit UCL...

Is XSS Solvable?

Props Gatsby

CMOS Exactly Solvable Chaotic Oscillator

The Great Gatsby: Jay Gatsby

A Solvable Routing Problem

Dominance -Solvable Games

Latham apartments

Darah Gatsby

SOLVABLE LIE ALGEBRAS - IIT Guwahati

The Great Gatsby - Prestwick House · Literary Power Point : The Great Gatsby . THE ROARING TWENTIES . Literary Power Point : The Great Gatsby . Title: The Great Gatsby Author: Andrew

Synergy, redundancy, and independence in population codes, revisited. or Are correlations important? Peter Latham* and Sheila Nirenberg † *Gatsby Computational.

Climate Change: Simple, Serious, Solvable

Gatsby Analysis

Gatsby anticipatory

University of Ottawa · suffices to prove a sum B4'C of two solvable ideals golvable. But B/ B C solvable AB a homomorphie Image Of B, and C {g Solvable, ga by recoverabüAty {g solvable,

Gatsby (1)

Jazz Gatsby

Gatsby ch6

The Great Gatsby – covers - wildbilly.dk covers.pdf · 2013-09-15 · The Great Gatsby F. scorr FITZGERALD GREÂT GATSBY FSCOTT.FITZCERÄLD THEGREAT GATSBY FITZGEk6L The Great Gatsby

Latham CFLMarketingStrategies