In section 3.6, we're going to look at some examples of applications of information diagrams. To obtain information identities in an information diagram is 'what you see is what you get'. To obtain information inequalities, we distinguish two cases. First, if mu^* is non-negative, then if a set A is a subset of a set B, then mu^*(A) is less than or equal to mu^*(B). This is because mu^*(A) is less than mu^*(A) plus mu^*(B minus A), because mu^* is non-negative. Now A and B minus A are disjoint, and so by set additivity, this is equal to mu^*(A union (B minus A)), which is mu^*(B). Therefore we have shown that if A is a subset of B, then mu^*(A) is less than or equal to mu^*(B). If mu^* is a signed measure, then it is less straightforward and we need to invoke the basic inequalities to compare mu^*(A) and mu^*(B). In example 3.12, we are going to show the (concavity) of the entropy function. Let X_1 has distribution p_1(x), and X_2 has distribution p_2(x). Now we define a new random variable X with distribution p(x) equals lambda p_1(x) plus lambda bar p_2(x), where lambda is between zero and one, and lambda bar is equal to one minus lambda. That is, p(x) is a mixture of p_1(x) and and p_2(x). We are going to show that entropy of X is greater than or equal to lambda times entropy of X_1 plus lambda bar times entropy of X_2. Consider the system as shown, in which the position of the switch is determined by a random variable Z, with probability {Z equals 1} equals lambda, and probability {Z equals 2} equals lambda bar, where Z is independent of X_1 and X_2. Now in the system, the switch takes position i if Z is equal to i, that is if Z is equal to 1 the switch is up and when Z is equal to 2 the switch is down. The random variable Z is called a mixing random variable for the distributions p_1(x) and p_2(x). Then, X has distribution p(x) equals lambda p_1(x) plus lambda bar p_2(x), as required. Now from the information diagram for the two random variables X and Z, we see that tilde{X} minus tilde{Z} is a subset of tilde{X}. Because mu^* is non-negative for two random variables, we can conclude, that mu^* tilde{X} is bigger than or equal to mu^* tilde{X} minus tilde{Z}, which is equivalent to entropy of X is bigger than or equal to entropy of X given Z. Now entropy of X given Z is equal to probability {Z equals 1} times entropy of X given Z equals 1 plus probability {Z equals 2} times entropy of X given Z equals 2. Now probability that Z equals 1, is equal to lambda, and probability Z equals 2 is equal to lambda bar. Now, entropy of X given Z equals 1 is equal to entropy of X_1, because Z equals 1 means that the switch is up. For this situation, X is equal to X_1. Likewise, the entropy of X given Z equals 2, that is, when the switch is down, is equal to entropy of X_2. This proves inequality 1 and it shows that entropy of X is a (concave) functional of p(x). The interpretation of this inequality is that the entropy of a mixture of distributions is at least equal to the mixture of the corresponding entropies. In example 3.13 we are going to show the convexity of mutual information. Specifically, let the pair of random variables X and Y have joint distribution p(x,y), which can be written as p(x) times p(y|x). We're going to show that for fixed p(x), I(X;Y) is a convex functional of p(y|x). Let p_1(y|x) and p_2(y|x) be two transition matrices representing two channels. Consider the system as shown, in which the position of the switch is determined by a random variable Z, as in the last example, where Z is independent of X. That is, the mutual information between X and Z is equal to 0. So when the switch is up, X and Y are connected by the channel p_1(y|x). And when the switch is down, X and Y are connected by the channel p_2(y|x). In the information diagram for the three random variables X, Y and Z involved in this problem, we let I(X;Z|Y) equals a, which is non-negative because this is a conditional mutual information. Accordingly, we have I(X;Y;Z) equals minus a, because I(X;Z) is equal to 0 by our assumption. Recall that the probability that Z is equal to 1, that is the switch is up, is equal to lambda, and the probability that Z is equal to 2, that is the switch is down, is equal to lambda bar. Then, I(X;Y) is equal to I(X;Y|Z), plus I(X;Y;Z). Here, I(X;Y|Z) is shown in the information diagram in blue. And I(X;Y;Z) is shown in the information diagram in red. Now this is less than or equal to I(X;Y|Z) because I(X;Y;Z) is equal to -a, and hence is negative. And we further write this as probability Z equals 1 times mutual information between X and Y, given Z equals 1, plus probability Z equals 2, times mutual information between X and Y, given Z equals 2. Now the probability that Z is equal to 1 is equal to lambda, and the probability that Z is equal to 2 is equal to lambda bar. The mutual information between X and Y given Z equals 1, that is when the switch is up, is equal to the mutual information between the input and output of a channel with p(x) being the input distribution, and p_1(y|x) as the transition probability. Similarly, the mutual information between X and Y, given Z equals 2, that is when the switch is down, is equal to the mutual information between a input and output of a channel, with p(x) being the input distribution and p_2(y|x) as the transition matrix for the channel. Thus we have shown that I(X;Y) is a convex functional of p(y|x). The interpretation of this result is that, for a fixed input distribution p(x), the mutual information between the input and output of the system as shown, which is obtained by mixing two channels p_1(y|x) and p_2(y|x), is at most, the mixture of the two mutual informations corresponding to p_1(y|x) and p_2(y|x) respectively. In the next example, we show the concavitiy of mutual information. Specifically, let the pair of random variables X and Y have distribution p(x,y), which again written as p(x) times p(y|x). We are going to show that for fixed p(y|x), I(X;Y) is a concave functional of p(x). Consider the system as shown, where the position of the switch is determined by a random variable Z as in the last example. Here, when the switch is up, X takes the distribution p_1(x) and when the switch is down, X takes the distribution p_2(x). In this setup, when X is given, Y is independent of Z, because the channel output Y depends on the position of the switch, only through the channel input X. So Z, X, Y forms a Markov chain. As we have seen before, for a Markov chain, mu^* is non-negative and the information diagram for X, Y, Z is as shown. From the information diagram, we see that tilde{X} intersect tilde{Y} minus tilde{Z}, which is shown in blue in the information diagram, is a subset of tilde{X} intersect tilde{Y}, which is shown in red in the information diagram. Because mu^* is non-negative, we immediately see that I(X;Y) is greater than or equal to I(X;Y|Z), which is equal to probability that Z is equal to 1 times I(X;Y|Z=1), plus probablility that Z is equal to 2 times I(X;Y|Z=2). Now the probability that Z is equal to 1 is equal to lambda. I(X;Y|Z=1), that is when the switch is up, is equal to the mutual information between the input and the output of a channel with p_1(x) being the input distribution and p(y|x) being the transition probabilities. Likewise, the probability that Z is equal to 2 is equal to lambda bar and I(X;Y|Z=2), that is when the switch is down, is equal to the mutual information between the input and the output of a channel with p_2(x) being the input distribution, and p(y|x) as the transition probabilities. This shows that for fixed p(y|x), I(X;Y) is a concave functional of p(x). The interpretation is that, for a fixed channel, by mixing the input distribution, the mutual information between the input and the output is at least equal to the mixture of the corresponding mutual informations.