So, clearly, there's going to be a problem in going to this limit of very

large word lengths. So, as the word gets longer an longer,

for a finite amount of data, you're going to have very few samples of a word of

that length. And so when one tries to estimate the

entropy of the distribution of words of this length, it's very unlikely that you

will have seen them all. And so not surprisingly, if you now look

at the entropy, plotted as one over the word length The entropy drops off at this

limit indicating that the information is not completely sampled.

So what can be done is to compute the entropy for different lengths of words

and you can see that these form almost a line.

And so one can simply extrapolate the tendency of this line back toward

infinite word length. And extract an estimated value for the

entropy at that limit. That's not what was done in this figure

this was purely the information directly captured.

And so one can look over different delta t's and different word length to see how

information depended on these parameters. So what you should notice is that there

is some limit. To DT, beyond which the information

doesn't grow anymore. As one looks at the woods in higher and

higher temperol resolution. So one takes into account finer and finer

details about how those spike patterns are generated.

and so that's what's being quantified as we move down this axis.

As the time discordization of the wood. These bin sizes, is getting smaller and

smaller, that's able to capture more and more of the variability, in the spike

train, that's actually signaling something different about the stimulus.

But that at some point, it seems that that, information, stops increasing.

So, this red, we're at about, you know, between 80 and 100 bits per second, is

the information rate. And you see that that stops increasing

with delta t, and of delta t of about 2 milliseconds.

So hopefully you'll remember from the jitter in the spike trance that we looked

at, that they seem to be repeatable on a time scale of about a millisecond or 2

milliseconds. So that time scale dt corresponds to the

time scale in which the jitter in the spike train.

Still allows one to read that off as an encoding of the same stimulus.

It's going to quantify approximately what's the temporal with that one can

discatize this spike train and still extract all the information about the

stimulus that distinguishes it from other stimuli.

So in this example we've seen one case where we didn't have enough data to be

able to sample say very long words. In general this is always true.

When one's trying to calculate information theoretic quantities, one

needs to know the full distribution of responses, and the full distribution of

stimuli. And there's simply never enough data to

come up with really reliable estimates for information, unless one has very

simple experimental setups. And so a lot of effort has been put into

finding ways to correct the sample distributions for the fact that there is

a finite amount of data. And there's been some very interesting

work by a number of groups over the last 15 years or so, that has made significant

advances in being able to compute information theoretic quantities from

finite amounts of data. Now we're going to turn to a different

approach, this one proposed by [UNKNOWN] Brenner and [UNKNOWN].

How much does the observation of a single spike tell us about the stimulus?

Now this is similar to the case that we started with at the beginning of this

lecture, but now we're going to address the question that we noted then What if

we don't know exactly what it is about the stimulus that triggered the spike.

It turns out that, as in the case we just went through, is straightforward to

compute information with an explicit knowledge of what exactly in the input is

being encoded. This is because the mutual information

allows us away to quantify the relationship between input and output

without needing to make any particular model of that relationship relationship.

So, the paradigm is exactly the same as before.

We're going to compute the entropy of responses, when the stimulus is random,

and the entropy, when given a specific stimulus.

So, here, things are a little simpler, than in the case of Wuds/g, without

knowing the stimulus, the probability that a single spike acud/g, is given by

the average firing rate times the bin size.

Similarly, the probability of no spike is just 1 minus that.

Now the probability of a spike at a given time during the presentation of a

stimulus r of t times the time then, when now r of t is the time varying rate

caused by the changing stimulus We can get an estimate of that time varying rate

by repeating the input over and over again.

The variability in these responses means that these events show a continuous

variation, and have some width as we saw before, depending on the jitter and the

spike times. So let's go ahead and compute the

entropy. We're going to define, for the moment, p

equals r bar delta t and p of t to be r of t delta t.

The information will simply be the difference between the total entropy,

we've already computed that in the beginning of the lecture For, for this

binomial case to minus p log p minus 1 minus p log 1 minus p and we need to

subtract from that the noise entropy. Now the noise entropy would take on a

value at every time t depending on the time variant firing rate.

Now again every time t represents a sample of stimulus S.

And averaging over time is equivalent to averaging over the distribution of s.

This ability to swap an average over the ensemble stimuli, for an average over

time, is known as ergodicity. At different values of S are visited in

time with the frequency that's equivalent to their probability.

So now we have our expression for the information between response and

stimulus, we can do some manipulations on it.

So we're placing back P by R delta T. We can take the time average firing rate,

to be equal, to the mean firing rate, so that's equivalent here to this, to the

integral, over, the probability as a function of time, in the mean, going

toward that main firing rate. And getting rid of some small terms, we

have here a couple extra, extra pieces that turn out to be small, we end up with

a rather neat expression for the information per spike.

let's take a closer looks at this expression, as we've emphasized already

This method of computing information has no explicit stimulus dependence.

Meaning no need for any explicit coding or decoding model.

It relies on the repeated part of the stimulus being a good estimate of the

distribution of a possible stimuli. Note also that although we computed this

for the arrival or not of a single spike, this formulism could be applied to the

rate of any event. For example the occurrence of a specific

symbol in the code. So this is a way to evaluate how much

information might be conveyed by a particular pattern of spikes, for example

a sudden inter spike interval. We can also examine what determines the

amount of information in the spike train /g.

So looking again at this expression, we can see that it's going to be determined

by two things. One is timing precision.

That's going to blur this function R of T.

So if events are blurred so that R of T increases and decreases slowly, without

reaching large values, this will reduce the information.

At the extreme, let's imagine, that the response is barely modulated at all by

this particular stimulas. In that case, r of t goes towards the

average firing rate. And one gets no information.

The more sharply and strongly modulated r of t is the more information it contains.

The other factor is the main firing rate. If the spike rate is very low then the

average firing rate is small and information is likely to be the large.

The intuition is that the low firing rate signifies that the neuron response to a

very small number of possible stimuli so that when it does spike its extremely

informative about the stimulus. Note that this is the information per

spike. The information transmitted is a function

of time, for the information rate is going to be small for such a neuron.

So let's look at some hypothetical examples.

Rat hippocampal neurons have what's known as a place field such that when the rat

runs through that region in space, the cell fires.

Let's imagine the place cell looks like this.

As the rat runs around the field, Is going to pass through that place field,

and what's the firing rate going to look like?

Here, as it moves through the field is going to go from zero, ramp up kind of

slowly, go down again. Because that place field is quite large,

the red is likely to pass through it farely often.

So we're going to get some R of T of that form.

Now let's imagine that the place field is very small.

Now, rat runs around. Very, very rarely passes through that,

that place field. And so, now, going to get almost no

firing and then some blip of firing as it passes through that field.

Now, what if the edges of the place fill the very shop?

So now again rat runs around. Very, very rarely passes through that

field, so now as the rat runs around, it passes through that place field very

rarely, but when it does, the firing rate increases very sharply toward its

maximum. So that's going to increase the

information we get from such a receptor field.

Okay, so now we're done with computing information in spike trains.

Next up we'll be talking about information and coding efficiency.

We'll be looking at natural stimuli. What are the challenges posed to our

nervous systems by natural stimuli? What do information theoretic concepts

suggest that neural systems should do when they encode such stimuli?

And finally, what principles seem to be at work in shaping the neural code?