0:02

So the variance of a random variable is another expected value property of a

distribution. Recall the mean measured the center of a

distribution. The variance measures how spread out it

is. So.

If x is a random variable and it has mean, , I, expected value of x equals , then the

variance of x is defined as the expected value of quantity x minus , whole thing

squared and, and the expected value. So what, what does that mean?

So, the expected values is in, is, is in essence an, an average, right?

So it's sort of the average or the typical value that the random that the variable

takes, the center of the distribution. On the other hand the, The variance is

sort of the, the, the average distance the random variable is from the mean.

So, what that means is, is sort of higher variances imply that variances are more,

what that implies is that, is that. Random variables with higher variances

come from distributions that are more spread out then ones that have a lower

variance. >> That makes sense and I'm just kind of

thinking of that fulcrum point still. >> Yeah.

>> How things are more spread out [inaudible].

>> Yeah, exactly, exactly, great. >> Alright.

>> And so, Let me just remind you what this formula, the variance formula means

again. If you were to take the random variable x

and figure out what the distribution was, if you were to subtract off its population

mean, which turns out to be the exact same distribution just with all the possible

values of x shifted by the value and then it has mean zero.

And then I were to take that random variable and figure out what the

distribution of the square of it is, then take the expected value of the resulting

random variable. And that's, that's hard, so we don't ever

calculate the variance that way. We typically calculate the variance by a,

a convenient shortcut, and that is that the variance of a random variable is the

expected value of x squared minus the expected value of x quantity squared, and

again this expected value of x quantity squared is just Mu squared.

This shortcut formula, then, requires you to calculate the expected value of

x-squared. But again, the, the kind of, ten,

typically the more convenient way to do that is to, to use, if it's discrete, the

summation of, of say, t-squared, p of t, where p is the probability mass function.

Or, if it's continuous, use integral t-squared, f of t, where f is the, the

density function. It would be nice for you as, as exercise,

to show that this original variance calculation.

Equals, this. Shortcut variance calculation.

Just by, expanding the square and using the.

Expected value, rules. It would be convenient if the variance

operator was also linear. It's not.

As an example, the, the variance if you pull a random variable out of the, out of

the variance, you, it gets squared. So variance of a times x, right, is not a

random variable it is a squared variance of x.

The square root of a random variable is called the standard deviation and the

reason we use standard deviation often instead of the variance is that the

standard deviation has the same units as the random variable.

So let's say x as a random variable has units in inches, the variance has units

inches squared, whereas, the standard deviation has units inches.

So, it's often quite convenient to. Talk about the spread in the same units as

the random variable itself. So the standard variation is a common

summary of the variance. Well let's, let's calculate a, a sample

variance. What's the sample variance from a toss of

a die? So in this case, expected value of x is

3.5. We've covered that already.

And expected value of x squared, let's calculate that.

Well, we have one squared times a sixth, plus two squared times a sixth, plus three

squared times a sixth, plus four squared times a sixth, plus five squared times a

sixth, plus six squared times a sixth. That works out to be 15.17.

And then you subtract 15.17 minus 3.5 squared, and that works out to be about

2.92. Let's go through a very important,

formula. Let's suppose we, flip a coin.

But let's make it slightly more interesting.

Instead of the coin having probability one-half of a head, let's say that it has

probability p of a head. So here expected value of x equals zero

times the probability of a tail, which is one minus p, plus one times the

probability of a head, which is p, so it works out to be p as the expected value.

And of course this works out with our calculation when the p happens to be

one-half for the, a fair coin. Now let's calculate the expected value of

x squared. Well, actually it's kind of interesting in

this case it's pretty easy to do that because x only takes on the values zero

and one, and if you square zero you get zero, and if you square one you get one.

So x squared is in fact exactly x. So expected value of x squared is equal to

the expected value of x which we already calculated as p.

So the variance of x in this case is expected value of x squared minus the

expected value of x quantity squared, which is p minus p squared, which works

out to be p times one minus p, which is a formula you may have encountered before.

It's interesting to know that this variance formula is maximized when p is.5,

so just simply plot the function p times one minus p, between zero and one.

So, plot this function between zero and one and you'll see that it maximizes at,

at.5. So the most variable coin flip can be is

if, in fact exactly a fair coin. It's, it's interesting to know that the

most variable a random variable can be, in general is if you shuff all its mass to

two endpoints. And equally distributed between those two

endpoints. That's so, if you have a continuous random

variable and you wanna make it more variable kind of chop out the middle and

spread it out equally distributed between the two ends.

And in fact let, let's, let's talk about this in greater detail.

Suppose that you have any random variable, like a uniform random variable, that's

between zero and one. And it's expected value is p.

Now, since the variable takes value between zero and one, p has to be a number

between zero and one. And then notice if, if x is a, a, a random

variable that's between zero and one, x squared has be less than or equal to x.

Because if you take, any number between zero and one and square it you get a.

Smaller number. And so X, expected value at X squared has

to be less than or equal to expected value of X which is P.

Therefore, the variance of X. Which is expected value of X squared minus

expected value of X quantity squared. Has to be less than or equal to the

expected value of X minus the expected value of X squared, which is P times one

minus P. And basically, this is then just a proof,

that the Bernoulli variants, this. Binary variance where the random variable

can only take the value zero or one, is the largest possible for a random variable

that has expected value of p. And then we also noted that we earlier

that the, the maximum value that you can get is when p is in fact 0.5, so this

basically just shows that the, this is basically a simple little proof that the

random variable, that the largest variance that you can get for a random variable is

that you. So to shove its mass to two endpoints,

and, the, the closer you can get to, to an equal mass in both the endpoints, the, the

larger the variance is. I' not sure if I'd mentioned this

previously but I called the, the variable a coin flip that can take heads with

probability p, I called it a Bernoulli random variable.

This is named after the mathematician Jacob Bernoulli who is one of the fathers

of probability and Jacob Bernoulli is an interesting character.

You should, you should read up on him. The Bernoullis were a very famous

mathematical family. They came up with lots of Lots of

discoveries, Jacob was a particularly influential member of the Bernoulli,

Bernoulli family and he discovered quite a bit of probability theory very, very early

on. At any rate, when you have a random

variable that takes the value zero or one with probability P, then we, we call that

a Bernoulli random variable. So here we are back.

Talking about variances, and. Variances are kind of difficult things to

understand and, and equivalently standard deviations.

I, I prefer to interpret standard deviations.

Intuitively we know that, that bigger variances mean distributions are more

spread out but, but we need some way to actually interpret what bigger a-, what

bigger means. Now in the context of a specific

distribution, we might learn. The, the kind of quantities associated

with that distribution to, to know that what, what does one variance mean, or two

standard deviations mean, three standard deviations mean?

And that's particularly true of the Gaussian, or bell-shaped density.

We, we know, we tend to know those, the values associated with those variances,

sort of, by heart. But there is a, a general rule that

applies to all distribution and its, its so called Chebyshev inequality, after the

Russian mathematician Chebyshev. So any rate, Chebyshev gave a really

useful inequality for interpreting variances.

So, Basically the inequality says the probability that a random variable is K

standard deviations from its mean, or more, is less than or equal to (1/K^2).

So let me repeat that because it's so important.

The probability that a random variable is more than K standard deviations from its

mean is less than or equal to (1/K^2). And let's just look at some simple

benchmarks for K. The probability that a random variable is

more than two standard deviations from its mean is.

25 percent or less, the probability of the random variable is three standard

deviations from its mean is eleven percent or less.

The probability of the random variable four standard deviations from its mean is

six percent or less. And again, note that, that, that is a

bound on the probability statement. It doesn't.

It's not an equality, so. It's the worst that it could possibly be

the, the, the lots of distributions the probability of being four standard

deviations or more beyond the mean is far lower than six%, but six percent is the

worst it can be. So, so it's unlikely, say, for example

that you will, if you Ob, observe a random variable, it's unlikely that you will see

that random variable be say, six standard deviations from the mean, that's, that's

quite unlikely, that's has probably less than one over 36, regardless of the

distributions. What, what's interesting about Chebyshev's

inequality is that it's, it's quite easy to prove.

And so well, let's just go through the proof really quickly.

Well, let's look at this probability statement.

The probability that a random variable is more than K standard deviations from its

mean. And, and let's do it in the, in the

continuous case. Let's just do it in the continuous case.

You can prove it more generally but, but this just gives you the intuition behind

the proof. Well that's the integral over the, the set

of x where it's more than k standard deviations from the mean, where here now

the little x and the, the domain of integration is a, is a dummy variable of

integration, f of xdx, and this could, could be, you know, we could replace this

by another letter over on the right-hand side but on the left-hand side it has to

be capital x. Well notice, notice the that x minus mue

over k sigma. Absolute value x minus mue over k sigma

has to be bigger than one. So if we square that it has to be bigger

than one as well. So you take a number that's bigger than

one and square it, it's still bigger than one.

So we can multiply by x minus mue squared over k squared sigma squared.

13:28

And we've only made the integral bigger. Right?

So we can replace this equality with an inequality where here the alligator's

chomping the bigger part. [laugh] Okay?

So now we have this quantity here. And we'll only make it bigger yet if we,

instead of integrating over this, this restriction of the domain, we'll, we

integrate over the whole thing. From minus infinity plus infinity because

everything, the, the X minus B squared is strictly positive, so we'll only make it

bigger. And then, notice now that this, the sure

that the case squared sigma part is a scaler that we can just factor out, and

then we have minus infinity, to plus infinity, X minus mew squared, minus

infinity to plus infinity integral of X minus mew squared X of FDX, well that's

just exactly the definition of the variants.

And that, so that equals sigma squared. The sigma squared's cancel and you get one

over K squared. So we see that the, the probability.

That X is more than K standard deviations from the mean.

We started out with an equal sign. We got bigger, we got bigger, then we've

had a final equality. So, the whole thing is less than or equal

to one over k-squared. So, I find it remarkable that Chebyshev's

Inequality, this powerful result that applies to all distributions, has such a

simple little proof. Let's go through some numerical examples,

just that, to, to. Show, why this.

Result is, useful. So, intelligence quotients.

And, I, you know, a, a, actually, I would recommend that you look up intelligence

quotients are often called Binet scales. They're, they have a very rich and

interesting history that intersects with statistics in several other fields and,

and, psychology and so on. And, I, I, it's really quite an

interesting literature on intelligence quotients.

So, I, I would, I would highly recommend you look it up just because it's quite

fun. But, but let's kind of skirt that

discussion and just say, let's suppose intelligence quotients really are

distributed with a mean of 100 and a standard deviation of fifteen.

What's the probability that, that a randomly-drawn person from a, from this

population of people that have IQs of, with mean 100 and standard deviation of

fifteen, what's the probability of drawing a person with a IQ higher than 160 or

below 40? And of course I picked the 160 or 40

specifically. Well 160 is four standard deviations above

the mean and 40 is four standard deviations below the mean, so Chebyshev's

inequality that the, that this will be no larger than six%.

If, if in fact, if in fact the IQ distribution is bell-shaped or is Gaussian

this bound is very, very conservative. Just to give you a sense of how

conservative the probability that a random draw from a bell curve being four standard

deviations from the mean is not six percent but on, but on the order of, of,

of ten to the minus fifth. 1000 of one%.

Which again, it doesn't violate the Chebyshev Inquality.

Ten to the minus fifth is less than.06 so it's, it's fine but it's quite a bit less

so, just to give you sense of how conservative Chebyshev inequality can be.

Let me go through another example. So a buzz phrase in...

In industrial quality control is... Is Motorola so-called 6-Sigma, and I have

to admit to being largely ignorant of exactly what the 6-Sigma Industrial

Protocol is, but might just the jest of it as far as I understand is that businesses

are suggested to control extreme events or rare def...

Or rare defective parts and the ideas that you go out six standard deviation, so.

Let's as an intellectual exercise, maybe you on your own, can go look up what

exactly the six sigma protocol is. Let's as an intellectual exercise talk

about what the probability of six sigma events are, the idea of having a random

variable that lies six standard deviations above the mean, well, that's by

Chebyshev's inequality, six standard deviations above or below the mean.

By Chebyshev's inequality, that's either, that's the probability of such an

occurrence is less than one over six squared which is about three%.

So it's highly unlikely. But again, remember Chebyshev's is a bound

that applies to all distributions. If you know something about the

distribution, for example, if you know the distribution is a bell curve then the

probability of a six sigma event is on the order.

Of, ten to the minus ninth, which is, I calculated, is one-ten millionth of a

percent. So, again, that doesn't violate

Chebyshev's Inequality, ten to the ninth is less than.03.

So it doesn't violate, Chebyshev's Inequality, but, any rate, that's what a

6-Sigma event is discussing.