Theorem 6.2 is the AEP for strong typicality,

which says that there exists eta greater than

0, such that eta tends to 0 as theta tends to

0 and the following hold. First, if x is strongly

delta-typical, then the probability of x is lower

bounded by 2 to the power minus n times entropy of X plus eta, and is

upper bounded by 2 to the power minus n times entropy of X minus eta.

Second, for n sufficiently large, the probability that a random sequence

X, is strongly delta-typical, is greater than 1 minus delta.

Third, for n

sufficiently large, the size that the strongly delta-typical set, is lower

bounded by 1 minus delta times 2 to the power n times entropy of X minus eta,

and upper bounded by 2 to the power n times entropy of X plus eta.

[BLANK_AUDIO]

Note that the form of the strong AEP, is very similar to the form of the weak AEP.

And the interpretation is also similar.

[BLANK_AUDIO]

We are going to discuss the proof for each part of the strong AEP.

Here we first give the proof idea for the first part.

If x is strongly typical, then the

empirical distribution is quote unquote about right.

[BLANK_AUDIO]

If the empirical distribution is about right, then everything

else, including the empirical entropy, would be about right.

That is, minus 1 over n times log of the probability of the sequence,

is approximately equal to the entropy of X.

And this is equivalent to p(x) approximately

equal to 2 to the power minus n

times entropy of X.

[BLANK_AUDIO]

Let us now prove the first part, that is property one of the strong AEP.

[BLANK_AUDIO]

For any sequence x which is delta-typical,

we have p of the sequence x equals p(x_1)

times p(x_2) all the way to p(x_n). And this

can be written as the product of all x in the support of x,

p(x) to the power N, x semicolon

sequence x.

Here we only need to take a product

over all x in the support, because the number

of occurrences of x in a sequence, is equal to 0, for all x not in the support.

[BLANK_AUDIO]

And therefore we see that, the probability of a

typical sequence is always strictly positive.

Then we take a logarithm of the probability of the sequence.

This is equal to the logarithm of a product

over all x in S_X. And so, it is equal

to summation x, log of p(x) to the power N x semicolon sequence x,

which is equal to N x semicolon sequence x, log of p(x).

Note that we adopt the convention that

summation x, means summation x in the support S_X.