Theorem 6.2 is the AEP for strong typicality, which says that there exists eta greater than 0, such that eta tends to 0 as theta tends to 0 and the following hold. First, if x is strongly delta-typical, then the probability of x is lower bounded by 2 to the power minus n times entropy of X plus eta, and is upper bounded by 2 to the power minus n times entropy of X minus eta. Second, for n sufficiently large, the probability that a random sequence X, is strongly delta-typical, is greater than 1 minus delta. Third, for n sufficiently large, the size that the strongly delta-typical set, is lower bounded by 1 minus delta times 2 to the power n times entropy of X minus eta, and upper bounded by 2 to the power n times entropy of X plus eta. [BLANK_AUDIO] Note that the form of the strong AEP, is very similar to the form of the weak AEP. And the interpretation is also similar. [BLANK_AUDIO] We are going to discuss the proof for each part of the strong AEP. Here we first give the proof idea for the first part. If x is strongly typical, then the empirical distribution is quote unquote about right. [BLANK_AUDIO] If the empirical distribution is about right, then everything else, including the empirical entropy, would be about right. That is, minus 1 over n times log of the probability of the sequence, is approximately equal to the entropy of X. And this is equivalent to p(x) approximately equal to 2 to the power minus n times entropy of X. [BLANK_AUDIO] Let us now prove the first part, that is property one of the strong AEP. [BLANK_AUDIO] For any sequence x which is delta-typical, we have p of the sequence x equals p(x_1) times p(x_2) all the way to p(x_n). And this can be written as the product of all x in the support of x, p(x) to the power N, x semicolon sequence x. Here we only need to take a product over all x in the support, because the number of occurrences of x in a sequence, is equal to 0, for all x not in the support. [BLANK_AUDIO] And therefore we see that, the probability of a typical sequence is always strictly positive. Then we take a logarithm of the probability of the sequence. This is equal to the logarithm of a product over all x in S_X. And so, it is equal to summation x, log of p(x) to the power N x semicolon sequence x, which is equal to N x semicolon sequence x, log of p(x). Note that we adopt the convention that summation x, means summation x in the support S_X. Now we deliberately subtract the quantity n times p(x) and add back the quantity n time p(x), so that we have done nothing. Now we multiple n times p(x) to log p(x), to obtain n times summation x, p(x) log p(x), and then multiply N x semicolon sequence x minus n times p(x) to log p(x) to obtain minus n summation x 1 over n, N x semicolon x sequence, minus p(x) times minus log p(x). Note that we have divide everything through by n, and then we multiply everything by n. And also this minus sign, and this minus sign, cancel with each other. [BLANK_AUDIO] Now summation x p(x) log p(x), is just minus the entropy of X, so we can write, minus n times entropy of X. And for the other summation, we simply copy it over. [BLANK_AUDIO] Since x is delta-typical, summation x to absolute difference between the relative frequency of x and the probability of x is less than or equal to delta. [BLANK_AUDIO] Now consider this summation in equation 1 and take the absolute value. [BLANK_AUDIO] By the triangular inequality, this is upper bounded by summation x, the absolute difference between the relative frequency of x and the probability of x, multiplied by minus log p(x). Now, minus log p(x) can be upper bounded by minus log minimum over all x in the support p(x). And this summation, by the assumption that x is delta-typical, is upper bounded by delta. For log minimal x p(x), we just copy it over. Now we call this upper bound eta. And obviously this is greater than 0. So we have proved that the absolute value of the summation in equation one is less than or equal to eta. Therefore, the summation is lower bounded by minus eta and upper bounded by eta. This is illustrated in equation 1. Now it then follows from equation 1, that log of p(x) is lower bounded by minus n times entropy of X plus eta, where this lower bound is obtained, by taking the upper bound eta on the summation in equation 1. On the other hand, log of p(x) is upper bounded by minus n times the entropy of X minus eta. And this upper bound, is obtained, by taking a lower bound minus eta on the summation in equation 1. So, we have the probability of the sequence x is lower bounded by 2 to the power minus n times entropy of X plus eta, and upper bounded by 2 to the power minus n times entropy of X minus eta, where eta, defined as minus delta, log of the minimum over all x in the support, p(x), where this logarithm is a constant, tends to 0 as delta tends to 0. And this proves property one. [BLANK_AUDIO] Let us now discuss the second property of the strong AEP, which says that for n sufficiently large, the probability, the probability of getting a strongly delta-typical sequence is very close to 1. Specifically greater than 1 minus delta. The idea of this property is actually very simple, although the details of the proof is a little hairy. Basically, by the Weak Law of Large Numbers with probability tending to one, as n goes to infinity, the empirical distribution of the random sequence X is close to true distribution p(x). And so by definition X is strongly typical. [BLANK_AUDIO] The proof goes as follows. To prove Property two, we write N x semicolon random sequence X equals summation k equals 1 up to n, B_k(x), where, B_k(x), is an indicator random variable, which is equal to 1, if X_k is equal to x, and 0 if X_k is not equal to x. So B_k(x) tells whether X_k is equal to x or not. And the summation k equals 1 up to n, B_k(x) simply counts number of occurrences of small x in the random sequence X. [BLANK_AUDIO] Then B_k(x), k equals 1 up to n are i.i.d random variables, because X_1, X_2, up to X_n, are i.i.d random variables. Specifically the probability that B_k(x) equals 1, that is, X_k equals x, is equal to p(x). And the probability that B_k(x) equals 0 is equal to 1 minus p(x). Note that, the expectation of B_k(x), is equal to, 1 minus p(x) times 0, plus p(x) times 1, which is equal to p(x). In general, the expectation of an indicator random variable, of an event, is equal to the probability of that event. [BLANK_AUDIO] Now, by the Weak Law of Large Numbers, for any delta greater than 0 and for any x in the alphabet, we have, the probability that the absolute value of 1 over n summation k equals 1 up to n, B_k(x), namely, the average number of occurrences of the value x in a random sequence x, minus p(x), the probability of occurrence of the value x, greater than delta divided by the size of the alphabet, is less than delta divided by the size of the alphabet, for n sufficiently large. In other words, the probability that the relative frequency of X deviates from the true probability p(x), by a small amount is very close to 0. Then, the probability that the absolute value of 1 over n, N of x semicolon X minus p(x), greater than delta divided by the size of the alphabet for some x, by our definition, is equal to, the probability that, the absolute value, of 1 over n, summation k equals 1 up to n, B_k(x), minus p(x) greater than delta divided by, the size of the alphabet, for some x. Now this is the probability of a certain event, for some x, and hence, we can write this as the probability of the union of the event over all x. By the union bound, which says that, the probability of A union B, is upper-bounded by the probability of A plus the probability of B, we obtain the upper bound summation x, the probability of the event, for a particular x. Now the probability of this event for a particular x is upper bounded by delta divided by the size of X. And so, we have summation x delta divided by the size of X. Because the number of terms in the summation is exactly equal to the size of X, this is equal to delta. [BLANK_AUDIO] Now the event summation x absolute value, 1 over n, N x semicolon X minus p(x) greater than delta, implies the event absolute value, 1 over n, N x semicolon X minus p(x), greater than delta divided by the size of the alphabet, for some x. Let us mark this event in yellow, and this event in green. Then we have the probability of the yellow event, is upper bounded by the probability of the green event. Therefore, the probability that X is strongly delta-typical, which is equal to the probability that summation x, absolute value, 1 over n, N of x semicolon X minus p(x) less than or equal to delta, is equal to 1 minus the probability that the summation is greater than delta. And this is the yellow event. And so, this is greater than or equal to 1 minus the probability of the green event, which we have shown is less than delta. And therefore we have obtained the upper bound, 1 minus delta. This proves proposition two. [BLANK_AUDIO] Property 3 of Strong AEP, says that for n sufficiently large, the size of the delta-typical set, is approximately equal to 2 to the power n times the entropy of X. The proof of this property follows from property 1 and property 2 of Strong AEP, in exactly the same way as in theorem 5.3. So we leave this as an exercise. [BLANK_AUDIO] Theorem 6.3 is a stronger result, which says that for sufficiently large n, there exists some function phi of delta, which is strictly positive, such that the probability of not getting a delta-typical sequence, is less than 2 to the power minus n times phi of delta. This means that this probability not only tends to 0 as n tends to infinity, but it tends to 0 exponentially fast. The proof of this theorem invokes the Chernoff bound. [BLANK_AUDIO] Since we are not going to use this theorem for the rest of the course, we omit the proof here. [BLANK_AUDIO]