2:36

Or you can do the sum, and substitute p and k in there again and get back to p of

zu. Okay, so that's enumeration.

And again, this math over here is not really needed except to check your

understanding of what's going on. Now let's look at the cumulative cost.

the way that we compute averages when using generating functions is take the

total cost in the objects of size n and divide that by the number of objects of

size n. That's the average.

that's a little bit different calculation than working with probabilities but since

we have this powerful mechanism for counting, I am forgetting counting

results, it's natural to do it and we can see from the ordinary bjf how this works.

So we're interested in the total cost in objects to size in.

So, again, that's for all values of k, we multiply, the number of objects, the

size, and the cost k, by k, that gives the cost for all of those objects, then

we sum over all values of k, we'll just call that, q sub n.

what's that value in terms of the Generating Function?

Well what we do is we differentiate with respect to u, that's the marker variable

for the cost and then evaluate it, z equals 1 and u equals 1.

So that's the partial with respect to u of the ordinary bi-variate generating

function. A value at it u equals 1 just from the

definition that's the sum on all objects, cost times z to number of objects and

again if you collect all the objects of a given size n.

That's, exactly the same as q sub n z to the n.

So it's the coefficient of z to the n in the partial with respect to u of the OBGF

evaluated at z equals 1. or you can look at this representation on

the right of the generating function. when we, sorry this one at the top.

If you differentiate with respect to u, it brings down a k.

So that you get k, p and k. and that's the familiar way to think of

the total cost. but in terms of the, bivariate generating

function, it's a simple partial derivative and then evaluate at 1.

So that gives us what we need to calculate the mean cost of objects of

size n. we take the coefficient of z to the n, in

the partial with respect to u, of the OBGF.

Evaluate at u equals 1. And divide by coefficient of z to the n

in the OBGF evaluated at u equals 1. and that's in many cases, a simple

calculation. And again, that's the average, which

maybe you're more familiar in the form over here, kp and k over, over pn and we

can calculate then the variance in the same way.

definition of the variance is the sum on k, k minus the means squared times the

probability in, in terms of the ordinary bi-variate generating function.

it's not hard to check but that's the second partial with respect to u

evaluated at z equals 1. n subtract of at the mean, subtract of

the mean squared just as with normal probability calculations.

So that's a overview of calculating moments from an ordinary bi-variate

generating function. just using partial derivatives with

respect to the variable that marks the cost and then evaluating at that variable

equals 1. And we'll look at an example next.

So this is our bit strings example for what's the uh,[COUGH] number of zero bits

in a ran, a random binary bit string. we talked in the last section about our

construction In the OBGF equation 1 over 1 minus z times plus u.

So now we'll start with this equation and calculate the average number of 0 bits in

a bit stream. And again we know the answer to this it's

going to be half. but this will illustrate the calculation

and we'll use the very same straight forward method for more difficult

problems. so how many binary bit strings are there?

Well we evaluate at u equals 1. So that's 1 over 1 minus 2z, and then

take the coefficient of z to the n. Now that's 2 to the n, so that's as

expected. to do the cumulated cost, the total

number of zero bits in all bitstrings of length n, we need to compute the partial

with respect to u of the OBGF. So to compute the partial with respect to

u, it's 1 over 1 minus z minus z u. all that's relevant is the minus z u, so

it's that to the 1 minus z minus z u to the minus 1 power.

so it's minus that squared and then uh,[COUGH] times the minus z.

The two minuses cancel, so the partial, with respect to u of that, is over here

on the right, z over 1 minus z minus z squared.

and to calculate the accumulated cost, we just evaluate that, at u equals 1.

So if u equals 1 that's z minus z, 1 minus 2z squared.

So we want for the accumulated cost, the coefficient of z to the n, and z over 1

minus 2z squared. and that's an elementary calculation.

from s, power series that, that it's n times 2 to the n minus 1.

And so now the average is just divide those 2.

and that's n over 2 as expected. so it's just computing the partial with

respect to u, which you know, once you've practiced a few times and remembered the

definitions, it's not so difficult a calculation at all, and it's certainly

one that can be done automatically nowadays.

so we're not going to compute the variance because there's an easier way to

do it that we'll talk about in just a second.

[COUGH] Okay now I mentioned the idea of horizontal, and vertical generating

functions that go with bi-variate generating functions, and so now, I

want to take a look at moment calculation using these methods.

so, let's look at the moment calculations in terms of the horizontal, ordinary

generating function. So that's a generating function that

gives the cost for an object of a given size.

So what we do is we take the coefficient of z to the n, in the bi-variate

generating function, and we'll call that little p sub n of u.

It's a generating function for the cost and all the objects of size n.

and it's just the sum over all objects that are of size n, u to the cost of that

object. and again that cheat sheet over here in

the terms of, of p and k that's that's a sum on kp and k, u to k.

So it's the generating function for cost. and so that's only got one variable, and

since we're interested in knowing the number of objects of size n and the

average number of zero[UNKNOWN] an object of size n, we can use that generating

function directly to get the answer that we want.

So the first numeration if you evaluate that, u equals 1, you get the number of

objects of size n. so that's just evaluating the UV 1 that's

summing the for all values of k the number objects of size n cause k, that's

equal to the number of objects of size n. So this paying use the useful thing both

the numeration and for accumulated cost. If you just differentiate the function

and evaluate it at 1. Then at some kpnk, which is your

accumulated cost, qn. So if we know that pnu evaluated at 1 to

enumerate, differentiate evaluate at 1 to get the accumulated cost, and just divide

those 2 to get the mean. and even simpler calculation usually.

And then the calculation for the variance just involves the second derivative.

And again, that's easy to check from the definition, of the variance.

So this method, using horizontal ordinary generating functions, is the one that we

use most often, to calculate average values and other moments of parameters.

14:08

Now we can also do moment calculations with vertical ordinary generating

functions. I put this slide up for completeness.

I'm not going to spend a lot of time on it because actually I, in the lectures,

we're not going to do any derivations this way.

but it's the. same idea, just on the other variable,

the vertical instead of the horizontal. If you take the coefficient of u to the k

in the ordinary bivariate generating function, you get the generating function

for the cost of objects of size k. And it's the same idea, you're just

holding the cost constant. And that's sum of p and k z to the n.

And so to enumerate we do it the same way, with the the bi-variate generating

function. But, accumulated cost is a coefficient of

z to the n in sum of kqk of z, where. q k of z is the bivariate generating

function. So we're, have to extract another

coefficient but there's some problems where it's easier to do the calculation

this way. and so, and it gets down to the mean

again, and you could do the variance the same way but I'm going to skip it.

so and again this is just for completeness this is not the way to

determine the average number of zero in a bitstring but and some mechanical as you

might do it and it illustrates the idea. So if we pull up the coificant of u to

the k, we get the vertical generating function.

In this case it's z to the k over 1 minus k to the k plus 1, and then we enumerate

same way as before to get the 2 at the end, but the cumulated cost is

coefficient of z to the n in sum on k, k times the vertical ogf...

and that's a familiar, geometric series, with with an extra factor, and here's the

details, for those of you who are used to those sums, that sum evaluates to z over

1 minus 2 z squared. the coefficient of z to the n and that is

n to the n minus 1 so we get n over 2 as before.

And again, in this case there's more work to get the same answer, but in some case

is where it's less work. And it's just a matter of the order in

which we extract the coefficients, usually, it's better to use the

horizontal where we extract the coefficient z to the nth first.

16:51

and in this simple example, we're always getting N over 2.

So why the, the focus on the moments? Well we're always interested in the

average, what value can we expect. and, this is classical in probability, so

I'm just going to then take a slide to mention these.

These classical results. So, if we have a random object of size N

with mean use of N and standard deviation sigmus of N.

Then we have these famous inequalities, the Markov inequality and Chebyshev

inequality. It, it says that the probability of being

much larger and small, or smaller than the mean, must decay.

And an upper bound on the rate of the decay is measured in units given by the

standard deviation. That's why the standard deviation is

important. The Markov inequality says, you're not

going to be, too many standard deviations away from the mean.

the Chebyshev inequality, says that the probability that you're, t standard

deviations away from the mean is less than 1 over t squared.

so actually, we can get much more accurate results than these for many of

the things we study cause we have the full distribution.

But still the basic idea is the same and what we say is that if the standard

deviation is of smaller order of growth than the mean then we call the

distribution concentrated because the means that as in gets larger the average

value tends. Towards the mean.

We can think of the mean value as a typical value.

We can calculate out some specific bounds but usually it suffices to, in the

analysis, to learn that the distribution is concentrated.

And we move on knowing that we could do more detailed analysis if we want.

So that's, this is just a formal statement of that.

If a distribution is concentrated, then then your value of your random value over

the mean approaches one in probability, in a very specific sense.

so, when it's concentrated we say the expected value is typical.

That's the one that you really do get expected to get.

And this is just, an example for a random bit.

If you have 100,000 random bits. You expect about half of them to be one

bits and about half of them to be zero bits.

the standard deviation is, of that, it's square root of N over 2, so this, it's

only about 5,000. So that says that the probability that a

value, the random value that you get Is between 49,900,000 and 50,100,000 is

better that 99%. That's a kind of result every one in

practice with really high probability. We have a good idea of what the value of

the random variables going to be. That's what we care about or

concentrated. Uh[COUGH] so this is just a plot of this

distribution, which is the binomial distribution.

as n increases, there's another curve. and so what it shows is that as n gets

higher, the curves comes tighter and tighter together towards the mean.

and so that's just true in many, many cases.

In fact, this particular curve describes lots of distributions.

we'll get to that later. Alright so here's just another example

really generalizing our number of zero bits example.

so let's talk about the class of all M-words.

so that's just Uh,[COUGH] strings with every you know, integer between one and m

or sequence of sets as we talked about last time.

so the size again is going to be the number of letters of a length of the word

but let's take as a parameter the number of occurrences of a particular letter in

the word. So that's our bi-variate generating

function, so to now construct that we use u to mark one the letters, a particular

letter, and then we don't mark all the others.

So that's n minus 1z. So we had m equals 2 before, we had to

use z plus z. Now we have general m.

It's in minus 1z. And that immediately translates to 1 over

1 minus, N minus 1 plus u z. We have N minus 1 z N u z, and that's it.

So now that's our ordinary bivariate generating function, if we want to

enumerate, we just evaluate that at u equals 1.

Evaluating that at u equals 1, then we want the coefficient as z to the n and 1

over 1 minus mz. That's m to the n.

So that checks. and if we want to the accumulated cost,

we differentiate with respect to u, and find the coefficient of z to the n.

and that's the same kind of calculation that we just did for 2.

and we get n m to the m minus 1 So the average number occurences of a given

letter, in a random n word, within n letters, is n over n.

And again, that's as expected. and we can also go ahead an compute the

variance, and it's n over n minus n over m squared.

and the square root of that, then, is proportional to the square root of N, so

,uh, it's concentrated. And that's a, direct analysis, using the

symbolic method, followed by, calculations using partial derivatives,

the basic method that, we're going to use to evaluate parameters.

And, I mentioned the practical application of hashing algorithms, and

this is from, our algorithms books, or Kanuth book, or, any, book on algorithms.

Where uh,[COUGH] we, organize a table of values, by, hashing them to random values

between 0 and N Input are n items into m keys interested into length of all the

list because we are going to have to search the list sequentially.

so that's the strategy, I was never interested in was the average number of

keys on the list. and again it's trivial that the average

is N over M but that's also what we just proved with the symbolic method.

Uh[COUGH] but the extra analysis to get the standard deviation.

tells us that it's concentrated which means as n increases it's going to that

values going to be typical. and that's useful to know in practice.

And we're not likely to have that case where they all fall on one list and

nobody falls on the others. So[COUGH] that's just one practical

example in many-many of the applications that we look at.

What we want to do is find the average value, show that the standard deviation

is of small order and that's the case in many-many-many of the applications that

we study. So that's the process of calculating

moments from the ordinary binary generating function.

and again, with the illustration, from familiar binary numbers, but, also one,

practical application. next, we'll look at applying these same

kinds of calculations in other applications.

[BLANK_AUDIO]