We have finished discussing the binary Huffman Procedure, we now discuss the D-ary Huffman Procedure. In this procedure, the smallest D probability masses are merged in each step, forming an internal node of the resulting code tree. If the resulting code tree is formed in k plus 1 steps, then there will be k + 1 internal notes, and D plus k times D minus 1 leafs, where each leaf corresponds to a source symbol in the alphabet. If the alphabet size has the form D plus k times D minus 1, then apply the Huffman Procedure directly. Otherwise, add a few dummy symbols with probability 0 to the alphabet, in order to make the total number of symbols have the form D plus k times
D minus 1. Here is an example with D = 3 and k = 2. And this is a possible code tree. In the Huffman Procedure, the number of steps is equal to k plus 1 equals 3. And that corresponds to the formation of the three internal nodes. The number of leaves is equal to D plus k times D minus 1, that is, 3 plus 2 plus 2 equals 7. We now discuss an upper bound on the expected length of a Huffman code. Theorem 4.18 says that, the expected length of a Huffman code, denoted by L_{Huff} satisfies, L_{Huff} less than H_D of X plus 1. This bound is the tightest possible, among all the upper bounds on L_{Huff} which depend only on the source entropy. Here's the proof outline. First, we construct a code with codeword lengths l_i equals the ceiling of minus log D p_i by showing that the Kraft inequality is satisfied. Then we show that the expected length of this code, summation i p_i l_i, is less than entropy of X plus 1. Then L_{Huff}, which is the expected length of an optimal prefix code, must be upper bounded by L, which in turn is less than entropy of X plus 1. We now prove the upper bound on the expected length of a Huffman code. Consider constructing a prefix code with codeword lengths l_i, where l_i is equal to the ceiling of minus log D p_i. Because l_i is the ceiling of minus log D p_i, it is at least equal to minus log D p_i and at most equal to minus log D p_i plus 1. Note that the second inequality is strict. Multiplying by -1, we have log D p_i greater than or equal to minus l_i, greater than log D p_i minus 1. Or, p_i is greater than or equal to D to the power minus l_i, greater than D to the power minus 1 p_i. Now using this upper bound on D to the power minus l_i. We have summation i D to the power minus l_i, less than or equal to summation i, p_i, which is equal to 1. That is, l_i satisfies the Kraft inequality, which implies that it is possible to construct a prefix code, with codeword lengths l_i. It remains to show that L, the expected length of this code is less than H D of X plus 1. Consider L equals summation i p_i l_i, using this upper bound on l_i we obtain the strict upper bound summation i p_i times minus log D p_i plus 1, and this is equal to minus summation i p_i log D p_i plus summation i p_i. The first summation is simply the entropy of X in the base D and the second summation is just equal to 1. Thus we conclude that L_{Huff}, which is upper bounded by L, is strictly less than H D of X plus 1. To see that this upper bound is the tightest possible, we have to show that there exist a sequence of distributions P_k such that L_{Huff} approaches H D of X plus 1 as k tends to infinity. This can be done by considering the sequence of D-ary distributions P_k, which consists of, first the probability mass 1 minus D minus 1 over k, and then D - 1 copies of 1 over k, where k is greater than or equal to D. The Huffman code for each P_k consists of D codewords of length 1, because there are exactly D probability masses. Thus L_{Huff} is equal to 1 for all k. This is shown in a figure. As k tends to infinity, H D of X tends to zero, because P_k tends to the deterministic distribution. And hence, L_{huff} approaches H D of X plus 1. The theorem is proved. To end this section, we discuss the Asymptotic achievability of the entropy of X. We have seen that the expected length of a Huffman code is lower bounded by the entropy of X and is upper bounded by the entropy of X plus 1. Now consider using a Huffman code, to encode X_1, X_2, up to X_n, which are n i.i.d copies of a generic random variable X. Now for this n random variables, the entropy is equal to n times the entropy of X. Then this big Huffman code with expected length L_{Huff}^n is lower bounded by n times entropy of X and upper bounded by n times entropy of X plus 1. Divide by n to obtain 1 over n times L_{Huff}^n greater than or equal to entropy of X, and less than entropy of X plus 1 over n. Note that the right hand side of this inequality tends to entropy of X as n tends to infinity. This quantity 1 over n times L_{Huff}^n is called the rate of the code, in D-it per source symbol. This is on average the number of D-ary code symbols required to encode a source symbol. The above inequality simply says that as n tends to infinity, the coding rate can approach H(X), the entropy of an individual source random variable, thus in the asymptotic sense, the entropy of X is the fundamental measure of the amount of information contained in the random variable X.