Hi, and welcome back. In this video, we're going to continue our study of discrete, random variables. We're going to add to it the notions of expectation and variance. Let's start with a motivating example from the previous video. In that example, we had a patient who needed a kidney transplant and we tested people, one at a time, until we found a successful match. We constructed the probability mass function for the random variable X, that counted the number of people who had to be tested before a successful match was found. Now, I want to ask another question. How many potential donors must be tested before a successful match is found? In other words, what's the expected value? Also known as the average or the mean, of the random variable? Notationally, we're going to use E of X to represent the expected value of x. A lot of times in statistics we use Mu_x, where the subscript x tells us which random variable we're referring to. When we have multiple random variables in a problem, which will happen later on, then having the subscript is going to be very useful. Now, what we'd like to do next is come up with an equation or a definition for that expected value. Let's start by thinking about an example. Suppose we have five exams and the scores of those five exams are 70, 80, 80, 90, and 90. If I want to look at the average, I'm going to add those up, and divide by five. Another way to write that would be one-fifth times 70, plus two-fifths times 80, plus two-fifths times 90, and we get 82.5 as our answer. Think about those coefficients. The one-fifth, the two-fifths for the 80, and the two-fifths for the 90. Those represent the probability that if we took those five exams, put them into a pile, and drew one out, we would expect to draw an 80 with probability two-fifths, a 90 with probability two-fifths, and a 70 with probability one-fifth. We're going to take that insight, and we're going to define the expected value of our random variable as k times the probability that X equals k. Then we're going to sum all of that up, over all possible values of k. We can think of the probability that X equals k. We can think of this as the fraction of the population with value k. The probability acts as the weight on the value k, and then we add that over all possible values of k. That's exactly what we did up here with our simple example of five exams. What if we have a Bernoulli random variable? Remember, that's going to be the probability that X equals zero, is one minus p, and the probability of X equaling one is p. What's the expected value on that? Well, we have only two possible values. We have zero times the probability of X equaling zero, and we have one times the probability of X equaling one. We just get p as our expected value for Bernoulli random variable. What about our geometric random variable? What happens there? Recall that the probability that Y equals k, is p times one minus P to the k minus one. That's the probability mass function. We have to sum over all possible values of k, that goes from one up to the infinity of k, times the probability that Y equals k. That's going to be the sum from k equals one to infinity of k, times p, times one minus p to the k minus one. We'll just recall from geometric series. The sum from k equals 1 to infinity of ar^ k minus 1 is the same as a over 1 minus r, if r is less than 1. We talked about that in the last video. We can differentiate both sides with respect to r. If we do that, we get the sum from k equals 1 to infinity a, k minus 1, r^ k minus 2. On the right hand side we get a over 1 minus r quantity squared. Now if k equals 1 on this left hand side, we just get a 0. We could think of this instead as just starting at 2 because the k equaling 1 doesn't give us anything, so we get that. Finally we can re-index and I'll re-index by setting k minus 1 equaling j, and so then we get the sum from j equals 1 to infinity of a, j, r ^ j minus 1 equaling a over 1 minus r quantity squared. Then what we can notice is that this form is exactly what we've got here our a is p, and our r is 1 minus p, and that's squared. That's going to give us p over p squared and that's going to give us 1 over p. This fits with our example from before. If p is one-tenth, so if P is one-tenth, the probability of a success is one-tenth. The expected value of our random variable is 1 over one-tenth which is just 10, which is what our intuition told us should be true. There's a few useful properties of the expected value that I'd like to discuss now. The first one is if c is a constant. What that really means is we have a random variable x and it's always equal to c with probability 1, so then the expected value is just c, there's no other possibilities. What if we have a and b are constants? We have a linear transformation of our random variable, then this is the same as the sum over all possible values of k, ak plus b. That's all the possible values of our random variable times the probability of getting x equals k. We can use the properties of summations to realize that this is the same as k times the probability of x equaling k and we sum over all possible k's. Plus we can factor out the b and we get sum over k, x equaling k. Then what we should realize is that this is just the expected value of x and the sum of all possible k's of the probability that x equals k, that's just equal to 1, because that's the whole probability mass function. What we end up with here is a times the expected value of x plus b. What we end up with is a times the expected value of x plus b. We can extend that to any function h, that's a function of our random variable x, and that would look like the sum over all possible, those of k_h evaluated at k times the probability that x equals k. You might think, why would we want to transform a random variable? Actually, it comes up from time to time. When you collect data, sometimes you collect a lot of data and sometimes you want to display it in its raw form. But sometimes you might want to transform it into, for example, a logarithmic scale. This gives us a way of calculating the expected value when we have transformed data. Related to expected value is the concept of a variance of a random variable. We denote it by V of X, and it measures how far we expect a random variable to be from the mean. So here's our random variable, and here's our mean. So the value that we get for X minus the mean, that gives us how far away from the mean is our data. Now, you might think, why do we want to square it? Well, if you think about it, some of our data is going to be bigger than the mean and some of it will be less than the mean. If we don't square it, then this will turn out to be zero. So by squaring it, we make everything positive, so our variance is always going to be a positive number. We will frequently denote the variance by Sigma squared sub x. The sub x just refers to the random variable that we're interested in. How do we calculate this? Well, we start with the sum over all possible case that's coming from this expected value right here. We have k minus our mean, we're going to square that and we're going to multiply by the probability of getting that value. We can expand this, so we get k squared minus 2 Mu sub x times k plus Mu sub x quantity squared times the probability that x equals k. I'm going to write it out one more time. We're going to sum k squared, probability of x equals k minus 2 Mu sub x sum over k, k probability that x equals k plus new sub x squared sum probability that x equals k, and we're assuming that over all k2. We notice this has quantity 1. This is our expected value, and this is something we're calling the second moment, the expected value of X squared. That's the second moment. Then we get minus 2 Mu sub x squared, plus Mu sub x squared. Finally, we get the expected value of x squared, minus Mu sub x squared. This is going to be a very convenient computational formula. A lot of times what we're going to do when we're calculating the variance, we could use our definition and calculate the variance from that. But a lot of times when we're calculating variances, we'll want to calculate the second moment minus the expected value squared. One more definition, the standard deviation is this positive square root of the variance. That will come up quite a bit in statistics. Let's go back and find the variance for our Bernoulli random variable, and for our geometric random variable. For the Bernoulli, X has the distribution of a Bernoulli. Recall that means the probability of X equals 0 is 1 minus P. Probability of X equals 1 is P, and we already calculated the expected value of X is equal to P. For the variance, I want to calculate the second moment minus the mean squared. We'll go over here and we'll look at the second moment. That's going to be the sum over all possible values of k, k squared times the probability that X equals k. In this case, we only get one possible non-zero value, and that's going to be when X equals 1. So 1 squared times P is just P. Our variance is going to be P minus the mean squared, or in other words, P times 1 minus P. That'll be the variance for our Bernoulli random variable. What about a geometric random variable? The probability mass function is 1 minus P to the K, minus 1 times P, and that's for all possible K equals 1, 2, 3 and so on. We already calculated the expected value of Y as 1 over P. Now, what we need to do is we need to calculate the second moment, and that's going to be the sum over all possible values of K, K squared times, the probability that Y equals K. That's going to be sum overall K, K squared 1 minus P to the K minus 1 times P. Now, this requires some fancy manipulation of series, which we're not going to do in this video. If you're interested of reference will be provided in the reading materials. We get as a sum 2 minus P over P squared. Our variance then for Y is going to be the second moment minus the first moment squared, and that's going to be 2 minus P over P squared, minus 1 over P squared. When we simplify that, we end up with 1 minus P over P squared. Now, for example, if P equals 1/10, we saw that the expected value of Y was 10 and it turns out that the variance of Y is quite large, it turns out to be 90, and the standard deviation turns out to be the square root of 90, and that's approximately 9.5. As we go through our study of various random variables, we'll start to get a feel for when numbers are bigger and when they're smaller, and what's that telling us about the data. Let's do one more example. Suppose you have 10 pieces of paper and they're labeled zero through nine, we put them into a hat and we draw one piece of paper at random. We're going to define U to be the number drawn. Let's calculate the probability, mass function, the expectation, and the variance for U. As an aside, this particular random variable is called a discrete uniform random variable. It's uniform because each value of the random variable has equal probability. The probability mass function we get, the probability that U is equal to K is 1/10. K equals 0, 1, 2 up to 9. The expected value of U is going to be the sum from K equals zero up to 9 of K times the probability that U equals K. That's 0 times a 1/10 plus 1 times a 1/10 all the way up 9 times a 1/10 and maybe not unsurprisingly that's going to be 4.5. The second moment, U squared is K equals zero up to 9, K squared probability that U equals K, so that's just going to be K equals zero up to 9 K squared times a 1/10. This turns out to be 28.5. Finally, the variance of U is going to be the second moment minus the mean squared. That'll be 28.5 minus 4.5 squared and that's 8.25. The variance is 8.25 and that again gives you some idea of how far away from the mean your data is. We'll continue in the next video with a few more discrete random variables. Thanks for watching.