0:39

doing a process called partitioning, which is the third line from the bottom.

A partitioning takes an array to be sorted and

divides it into two pieces out of position j with the property that

every element with an index less than j is less than a(j), and

every element with an index bigger than a(j) is bigger than a(j) or maybe equal.

So it partitions the array into those two parts, and if an array has that property,

then you can get it sorted simply by making the two recursive calls

that do this sort, sort the left half, sort the right half.

The partitioning process is handled by running two indices, one i and

one j, i going from low to high and j going from high to low.

And we simply scan from the left, incrementing i as long as

we have elements that are less than the partitioning element,

which we put it a to low to start it up.

And then scanning from the right, as long as we have bigger ones,

if we found a small one on the right and a big one on the left, we exchange them.

And then we keep going until those pointers cross at which point we put

the actual partitioning element into the place.

If you're not familiar with that, read about the details in Algorithms

4th edition or on the book's site, that's a quick review.

So we have to make some preliminary decisions when we're going to study

that kind of code.

And so the first thing is the cost model, and

cost we're going to talk about is the running time.

2:26

But really, it's better to come up with an abstraction.

We're not really talking like about microseconds or seconds.

What we want to do is talk about something abstract.

And for sorting algorithm, what it is is it compares,

what the algorithm is doing most often is comparing two items.

So let's just count compares.

And then the work we do for counting compares,

that takes us from time to just to discrete counting process, and

that will mean that our analysis is valid for any kind of machine at all.

We get the frequency of execution that compares them, and

then the time is just how quickly can you do a compare.

That all comes rolled up into a constant.

So our hypothesis is going to be, if the number of compares is C,

then the running time is till the -aC on any computer.

Now a is going to be different for different computers or

different systems or different languages, but it's a reasonable hypothesis, and

it bears out in many, many practical situations.

So, that's a big decision at the beginning to get away from time and

get into something abstract associated with the algorithm without

losing the ability to make some scientific studies later on.

So the input model, we're going to assume the input to be randomly ordered, and

as I mentioned, in the case of sorting algorithms, that's pretty easy to arrange.

We just randomly order them.

Also, just to make the math simpler in this first cut,

we're going to assume the keys to be all different.

That's not always so easy to arrange,

and I'll have more to say about that in a little later.

So it's a key question, are these assumptions realistic, and

I think the answer is, we'll get into that.

But we made these assumptions and these models

with the knowledge that we're going to be able to test them scientifically.

And we can go ahead and do that, and

that's part of the process that I'll describe.

So the bottom line from this slide is that we're going to count compares, and

we're going to assume that the keys are randomly ordered and distinct.

So in that context,

now there's some relevant questions about this quicksort code.

So we're going to assume array of size N, entries distinct, and randomly ordered.

So first question is, let's break out the partitioning process.

So how many compares are involved for this partitioning process?

Well, that doesn't take much analysis to see that what happens is that every

item in the array is touched in order to get the partitioning done.

Actually, there's one extra because our partitioning elements cross, so

it's N + 1.

6:38

So it's a recursive program that has a random input model.

And that leads to a formula called a recurrence relation.

And we'll look at recurrence relations in detail later on.

We've got N entries, distinct and randomly ordered.

If we let CN be the expected number of compares used by quicksort,

then we can write down an equation for CN, that is [COUGH] a mathematical

model that reflects the recursive model of the program itself.

7:30

And those are randomly ordered, so the cost of doing

the left subarray, which is a size k-1, is just Ck-1.

And the cost of doing the right subarray is C sub N- k.

So that's a formula that describes the running time,

that number of compares use by quicksort.

There's one more detail, if there's only one element, we don't do anything So

we have to make sure that we define the small values of n appropriately.

8:21

So, I'm going to do it for this specific recurrence, and then in a couple of

lectures we'll talk more generally about how to solve problems like this.

And that's a mathematical problem, CN = N + 1,

and how are we going to possibly approach something like this?

Now, in this one, there's a couple of tricks that get the job done.

And, I'm not going to try to explain the origin of those tricks.

What I want to show is how simple the calculation is.

But we'll get this solved in just a couple of slides.

So first thing to notice is the sub files are really kind of the same.

And actually, if you kind of write this out in different notation,

both of them are c zero plus c one plus dot dot dot plus cn minus one.

That is the subsequent files could be any one of those sizes with equal probability.

So the two sums are the same.

So you could reduce it to just one sum with a factor of two over n and

the one over n is not inside the sum.

So that's a simpler form right there, just the observation

either just totally mechanically that the two signs are the same or

you can argue more broadly that there is no difference between the two sub files.

So, you can start with this one if you wanted.

They are symmetric.

9:53

Now, the next trick is to get rid of the sum.

And the way we're going to get rid of the sum is,

write down the formula for N-1, and then subtract.

So if we do that on the left, we get N C N-(N-1) C N-1.

Same formula for N-1 on the left.

And then on the right, this term N(N-1)- (N-1)N, just leaves 2N.

So that's a little algebra.

And then what's the reason we did it is if you take this formula for N, and subtract

the same formula for n- 1 all that's left is just one term on the right the 2cn- 1.

For now we have a recurrence relation that has no so

many and that's clearly going to be much simpler to deal with.

And then just collecting terms, we have this very simple recurrence relation,

describing the number it compares taken by quick sort at the bottom.

It's not a solution yet, but it's way simpler than what we started with, and

certainly way simpler than scratching our head, looking at the program,

trying to understand how many compares that it takes.

11:08

And so, actually, this is a pretty good milestone,

in the analysis of this algorithm.

Because we've got now, a pretty efficient algorithm for computing the result.

If we try to compute the result from the original recurrence,

it would take quadratic time.

So, that's the code.

And I'll talk more about using code to learn about recurrences.

But in order to do N, you have to do a double for

loop, you have to for every N, you have to do for every k quadratic time.

So, we can use that to try to estimate the value of c[N] for

say N equals a million >> We don't have a million squared cycles,

even on fast computers today.

Or a billion or a trillion.

People are trying to sort files that much, so we would want to be able to compute it.

But that original thing is not good enough to do it, but with the manipulations that

we just made we have a linear time algorithm for computing it.

Just started at zero and then just use this formula to compute later.

[COUGH] Values, so

we might not know too much about what it is, but we know enough to calculate

the numbers of compares for any N just by running this simple program.

And it is interesting to think of really, I talked about suppressing detail,

really, what we want is a fast way to compute the running time of a program.

And this is a good.

There's a lot of algorithms we'd like to be able to get this far on.

Actually, we're going to have much, much faster way.

We just want to compute it with just evaluating a few elementary functions.

So let's look at how to do that next.

12:53

Now we're going to actually solve it.

So we're going to express this quantity in terms of familiar elementary functions.

How are we going to do that?

Now the first step is a magic step.

It's kind of tricky but actually there's solid motivation for

it that we'll talk about later on.

So we're going to take both sides of the equation and divide by N times N plus 1.

So on the left CN divided by N times N+1, is just CN over N+1.

On the right, the N+1s cancel.

We get CN-1 over N.

And then 2N over N times N+1, is just 2 over N+1.

That looks like a little bit more complicated form.

But it's actually a much simpler form because the first term

on the right is the same as the term on the left, it's just within reduced by one.

And what that means is we can apply this same formula to that term.

And so if we apply that same formula to that term then so

that's cn+1=cm2x1 If we plug

in CN- 1 = C N-2 over N-1 plus 2 over N, then we have that formula.

And then we can continue doing that until we get down to our starting point, C1.

14:18

And then, there's no unknowns on the right-hand side, and that's a solution.

So, [COUGH] It's a simple version of this solution.

There's a small constant term left out.

It's that C N is tilde 2N times the sum from k from one to N over 1/k -2N.

And that's just running that out, multiplying by N+1, and doing the sum.

That's not a simple version with no unknowns on the right.

Now this looks more complicated than what we started with because its got a sum

back in there so what do we do about that.

Well sums often, simple sums we can approximate with integrals.

And again, this may be familiar to many of you,

but certainly we'll go into the math behind approximating sums with integrals.

So this is Close to integral of 1/x dx

plus gamma which is Euler's constant which is about 0.57721.

And that gives us a formula for c n.

With, in terms of familiar mathematical functions.

It's 2n, natural log n, minus this constant time n.

And that's Cn is a tilde of that value.

15:44

Now there are approximations.

And we will be talking about how to be precise about making these approximations.

But other than that, this is Fairly straight forward

mathematical manipulations that get us to a solution for

the number of compares used by Quicksort.

So now, even when we're just doing math for the analysis of algorithms,

we can always check our math with a computer.

And that's always worthwhile.

I said I'm claiming that this is pretty close to CN.

Well, I can just write a program to compute this.

That's 2*N*Math.log in Java,

is natural log of N minus, and look up Euler's constant.

That's a value.

And I can also, as I mentioned, compute the actual values of CN.

And then I can just print out a table.

And sure enough, that's a check that we're within 30 for a million.

That's pretty close.

And we could actually get close if we wanted.

That's the first thing that's wonderful about the analysis of algorithms, even for

beginners, is that you can always check your work.

It's very specific what this thing is.

And you're saying, I think I have something that's close to it.

Well, you can check and then you can know.

And then you can have confidence to move on.

So that's the first thing.

So with that, we've checked the math.

But there's one other thing that we have to check, and that's to check the model.

17:21

And so we thought that if it's randomly ordered distinct keys,

that should be the running time.

So what we'll do is, we'll run the code.

And this is 1,000 trials for each value of N,

N from 1 to, I don't know, a 1,000, I think.

And there's a gray dot for each trial in the spot.

You can hardly see the gray dots, with a red dot right in the middle,

which is the average for each N.

So that's what the experiment gives us is actual results,

which is the number of compares.

And then what we want to compare that against is this function.

So we can just plot that function and boom, it's right on.

Run the algorithm, and here we run the algorithm maybe a million times.

And that experiment shows us that the formula that we got for

the number of compares is right on.

18:14

So that's checking the model, and very successful in the case of Quicksort.

So just as an observation,

that maybe we're not just interested in the average costs.

Maybe we're interested in the distribution of costs.

How likely is the actual thing supposed to be, how close is

the actual running time of my one sorting problem going to be to this average?

And you can see from the narrowness of these gray dots is that it comes

pretty close to the average.

It's very typical in the analysis of algorithms.

That standard deviation is a slower growing function than the average, and so

it regresses to the average as N increases.

But there's interesting mathematical questions, plenty of interesting and

challenging mathematical questions when we start looking at problems like this.

What is the distribution of costs?

Most people would say well maybe it's a normal distribution,

maybe it's a bell curve.

But it's not normal.

And actually it was only ten years ago.

After Quicksort was discovered in 1960, and people were looking for

a long time to try to understand is it normal.

And that's actually not normal.

19:31

So that's the exact distribution from the recurrence, which we can compute,

not just the probability that the average is a certain value.

We can compute for

every k, the probability that it takes exactly k comparisons.

And then what this plot is, is taking those curves and

centering them on the mean, to get an idea of what kind of curve we come close to.

And it's not a bell curve, it's got kind of a tail off to the right.

20:00

And actually, this is just an empirical validation of running, again,

1 million sorts just to show that the actual behavior is like that.

And that's interesting mathematic.

What is this curve?

That's a new function.

It's not a normal distribution, it's definitely worthy of mathematical study.

After all, this is probably one of the world's most heavily used algorithms.

We might want to understand this property of it.

Maybe that'll help us understand something else in the future.

So the bottom line from all of this work, and

this is just a summery of the 50 years of reasearch on Quicksort.

Is that really a lot is known about the performance of Quicksort.

And there is plenty of interesting of research problems in the, literally,

thousands of papers that have been written

about Quicksort in the 50 years since it was discovered.

And that's not the end of the story, I have two more comments.

The first one is about prediction, and

this is something that we teach in the very beginning of the algorithms book.

That just using a hypothesis like this makes it simple to predict performance,

even of extremely complicated algorithms in extremely complicated scenarios.

So if that's our hypothesis, what we're going to do is run the experiment for

some input size N, some big N.

And maybe run it a bunch of times to get a good average on the running time.

If we wanted to, that's enough information to solve for a.

Whatever our time is for whatever N is, then a's the only unknown,

and we can go ahead and compute a.

But actually, we usually don't need to even do that.

We can just predict the time for some constant times N, just by doing the math.

So say we wanted to predict the time for

10N knowing that we have the running time for N.

Well, if you just do the math, if we take a(10N) log (10N), over aN log N.

The a's cancel, we don't even have to know the constant.

It says that the running time is going to increase by a factor of 10,

plus 1 over a log base ten of N.

And this gets smaller as N gets bigger, so

its going to increase the problem size by about 10.

You're going to increase the running time about 10.

And that's a very accurate prediction, and that's useful.

And that we now teach to first year students, and

in fact we teach to nearly 70% of all Princeton students this simple idea.

That if you run a program for a big input, and you have a decent model for

its performance, you can predict the running time.

So for example, run it for a 100,000, 100 times.

And again we can run as many experiments as we want nowadays, and

it takes 4 seconds.

So that means that I'm going to predict that it's going to take

just a little more than 40 seconds for 1 million.

22:59

So this is an experiment, and then I'll run it for 1 million.

It takes a little bit longer, it takes 40 seconds.

And I can observe the running time of 41 seconds.

That gives me a lot of confidence to predict how long it would take for

1 billion, which is maybe the problem that I really need to solve.

It's going to take 11 hours to solve for 1 billion.

And that really, any programmer can

have this kind of understanding on the performance of programs.

It's all about getting that first model.

If all you're interested in is prediction of running times.

It's very powerful.

23:33

Now, I'm not saying that we shouldn't have an accurate mathematical model.

It's also important to have a mathematical model.

You might want to think about the reason for that.

And I'll show an example later on and you'll have an exercise.

The reason is, that there might be parameters that are involved

in the running time of the program, and

we might have the freedom to set the values of those parameters.

Even for one parameter, you'll see an example of that in the exercise.

We can't afford to run experiments for

every possible value of the parameter but if we have a mathematical model,

then we can use the mathematical model to find the optimum value of the parameter.

And that's a very strong reason to keep working towards good mathematical models.

Now as I mentioned the other thing that really we should do

is validate model, our model and applications.

25:59

And so it was very important to

be able to shave every microsecond off that sorting running time.

And the machines of those days, and

particularly the Cray-1 wouldn't even work unless

every instruction took precisely the amount of time that it said in the manual.

And we could work with machine language versions of Quicksort and

predict the running time to within a microsecond.

Which is predicting the running time for doing it a million times to within

a second or within a billion times within a few hours.

26:57

What happened was that a user came along and

had an application that violated one of our assumptions.

He had files that had only a couple of different distinct values and and

his application performed poorly.

So people went in and looked more carefully and over a few years time,

John Bentley, who's one of the researchers at Bell Laboratories and

I, and some others, came up with ideas for addressing this situation.

And those things are described In the fourth edition of Algorithms where we

modify the algorithm with a better understanding of it to handle

this situation.

Now, to go back and do the analysis and refine the model and

check scientifically required some serious research and that was only

recently that we came to the resolution of how those algorithms perform in practice.

But still that gave library models that gave us much more robust

sorting algorithms.

Whose efficiency did not depend on this idea that the keys must be distinct.

And so much broader use of libraries that

was only enabled through trying to get the analysis of algorithms right.

Let's try to get a model that reflects what's seen in applications.

And that's ongoing.

Just recently I was contacted by somebody working

in a college working in a networking start-up.

And what they need to do is sort a billion

records that are about a thousand characters each.

And they had to do it fast or

their competition would beat them and they'd go out of business.

And he found a refinement to our three way partitioning methods for strings.

That was able to improve it even further.

28:57

And so, again, this is over 40, 50 years the analysis of algorithms

really helping us to best understand the performance of this important problem.

Now I'm not going to be talking that much about the systemy and

stuff in this course but I wanted to go through this example in detail because

the map is very relevant to what we do in this course,

and I want to make sure that people are motivated

to look into the kind of math that we're going to be talking about.

So that's a full example of the analysis of algorithm that's Quicksort.

And Knuth succinctly

expressed the idea that I want to leave you with for this example.

29:45

People who analyze algorithms have double happiness.

First they experience the sheer beauty of elegant mathematical

patterns that surround elegant computational procedures.

Just in terms of the pure math what programs bring to the table

is another level of interesting mathematics.

But then, if you're in the analysis of algorithms, you get a practical payoff,

because your theories make it possible to get other jobs done more quickly and

more economically.

So, it's not just the math, it's that the math is helping us

press forward in all kinds of important and identifiable ways.

So that's a full introduction to the analysis of algorithms through the study

of Quicksort.