In section 6.3, we introduce and discuss the notion of joint typicality. [BLANK_AUDIO] Here is the setup. Consider a bivariant random process X_k, Y_k where k is bigger than or equal to 1. And (X_k,Y_k) is a pair of i.i.d random variables, with generic joint distribution p(x,y). [BLANK_AUDIO] Let the pair of random variables (X,Y) denotes the pair of generic random variable, with finite joint entropy. We also assume that the alphabet X, and alphabet Y are both finite. Here are some notations. Consider a pair of sequences (x,y) of length n. Let N x comma y semicolon sequence x, sequence y, be the number of occurrences of the pair of values x, y, in the pair of sequences x and y. Dividing this by n, it becomes the relative frequency of the pair of values (x,y), in the pair of sequences (x,y). The collection of relative frequencies is called the empirical distribution of the pair of sequences (x,y). [BLANK_AUDIO] Here is an example. Let x be the sequence 001011, and y be the sequence 010110. Then the number of occurrences of (0,0) is equal to 1, the number of occurrences of (0,1) is equal to 2, the number of occurrences of (1,0) is equal to 2 and the number of occurrences of (1,1) is equal to 1. [BLANK_AUDIO] In definition 6.6, we define the strongly jointly typical set. The jointly strongly typical set, T_{XY delta}^n with respect to the generic distribution p(x,y), is the set of pair of sequences (x,y), such that, the number of occurrences of (x,y) is equal to 0 for (x,y) not in the support of X and Y, and summation x, summation y, the absolute difference between the relative frequency of (x,y) and the true probability of (x,y) is less than or equal to delta, where delta is an arbitrarily small positive real number. A pair of sequences, (x,y), is called strongly jointly delta-typical, if it is in T_{XY delta}^n. [BLANK_AUDIO] Theorem 6.7 is a property called consistency. It says that, if a pair of sequences (x,y) are jointly delta-typical, then x... It says that, if a pair of sequences (x,y), is jointly typical, with respect to p(x,y), then x is typical with respect to the marginal p(x), and y is typical with respect to the marginal p(y). Theorem 6.8 is called the preservation property. Consider a random variable X, and let Y be f(X). [BLANK_AUDIO] If a sequence x given by x_1, x_2, up to x_n is delta-typical with respect to the distribution p(x), then the sequence f(x) obtained by applying the function f, to the sequence x component-wise, is delta-typical with respect to the marginal distribution p(y). The proof of these two theorems are rather straightforward. For the details, please see the textbook. Theorem 6.9 is the definition for Strong Joint AEP. Consider a pair of random sequences (X,Y) whose components are (X_1,Y_1), (X_2,Y_2) all the way to (X_n,Y_n), where (X_i,Y_i) are i.i.d with generic pair of random variables (X,Y). Then there exists lambda greater than 0, such that lambda tends to 0 and delta tends to 0, and the following hold. First, if a pair of sequences (x,y) are jointly delta-typical, then the joint probability of x and y is lower bounded by 2 to the power minus n, times the joint entropy of X and Y plus lambda, and is upper bounded by 2 to the power minus n times the joint entropy of X and Y minus lambda. For n sufficiently large, the probability of the pair of random sequences (X,Y) is jointly delta-typical is greater than 1 minus delta. And third, for n sufficiently large, the size of the jointly delta-typical set is lower bounded by 1 minus delta times 2 to the power n times the joint entropy of X Y minus delta, and upper bounded by 2 to the power n times the joint entropy of X and Y plus lambda. The proof of the Strong Joint AEP, is exactly the same as the proof of the Joint AEP, so it is omitted. Let us now discuss a very fundamental tool called Stirling's Approximation. In Lemma 6.11, we present a simplified version of it, which says that for large n, the natural log of n factorial is approximately equal to n times log n. The proof goes as follows. First, we write log n factorial as log 1 plus log 2 plus all the way to log n. Since log x is monotonically increasing, we have the following bounds for the k-th term above, namely log k is lower bounded by the integral of log x dx, from x equals minus 1 to k, and upper bounded by the integral of log x dx from x equals k to k plus 1. [BLANK_AUDIO] These bounds on log k can be visualized graphically. For the lower bound, we see that it is valid because log k is the area of the green bar. Obviously it is lower bounded by the integral of log x dx for x from k minus 1 to k. For the upper bound it is seen to be valid because log k, which is the area of the red bar, is obvious the upper bounded by the integral of log x dx, for x from k to k plus 1. Now coming back to our proof, by summing these lower and upper bounds on log k, for k equals 1 up to n, we have log of n factorial lower bounded by the integral of log x dx from x equals 0 to n, and upper bounded by the integral of log x dx from x equals 1 up to n plus 1. Upon evaluating the integrals we have log n factorial lower bounded by n log n minus n, and upper bounded by n plus 1, log n plus 1 minus n. Note that we have n log n in the lower bound, and n plus 1 log n plus 1 in the upper bound. These terms are much bigger than n, when n is large. And so for large n, we can approximate log n factorial by n times log n. Let us now discuss a very useful approximation for the binomial coefficient, which says that for large n, n choose np which is written as n, np comma n 1 minus p, is approximately equal to 2 to the power n times the entropy of the binary distribution p, 1 minus p, where the entropy is in bits. For the proof, consider the binary coefficient n choose np equals n factorial divided by np factorial times n times 1 minus p factorial. By taking the natural logarithm, we have log of n factorial minus log of np factorial minus log of n times 1 minus p factorial. [BLANK_AUDIO] We now use Stirling's Approximation for each of these logarithms. First, log n factorial is approximated by n times log n. Log np factorial is approximated by np log np. And log n times 1 minus p factorial is approximated by n times 1 minus p log n times 1 minus p. Now write log np, as log n plus log p, and log n times 1 minus p, as log n plus log 1 minus p. Now observe that n log n, np log n with a minus sign and n times 1 minus p times log n also with a minus sign, cancel with each other. Upon factoring n in the remaining terms, we have n times minus p log p minus 1 minus p log 1 minus p. And this is seen to be the entropy of the binary distribution p, 1 minus p in the natural algorithm. Upon changing the base of the logarithm to 2, we have log 2 of the binary coefficient approximately equal to n times the entropy of the binary distribution p, 1 minus p, in the base 2. And this proves the lemma. This approximation of the binary coefficient can be generalized to the multinomial coefficient. Let p_1, p_2 up to p_m be a probability mass function. For large n, the multinomial coefficient n, np_1, np_2 up to np_m, which is equal to n factorial divided by the product over all i n times p_i factorial, is approximately equal to 2 to the power n times the entropy of the distribution p_i, in the base 2. Again, we assume that np_1, np_2, etc., are all integers.