0:00

I've said pretty much everything I want to say about sorting at this point but

I do want to cover one more related topic.

Namely the selection problem.

This is a problem of computing ordered statistics of an array

with computing the median of an array being a special case.

Analogous to our coverage of quick sort the goal is going to be the design and

analysis of a super practical randomized algorithm that solves the problem.

And this time, we'll even achieve an expected running time that is linear

in the length of the input array.

That is big O of n for input arrays of length n, as opposed

to the o of n log in time that we had for the expected running time of quick sort.

Like quick sort, the mathematical analysis is also going to be quite elegant.

So in addition these two required videos on this very practical algorithm will

motivate two optional videos that are on very cool topics but

of a similar more theoretical nature.

The first optional video is going to be on how you solve the selection problem in

deterministic linear time.

That is without using randomization.

And the second optional video will be a sorting lower bound that is why no

comparison based sort can be better than mergeshort.

Can have better running time than big O of n login.

1:04

So a few words about what you should have fresh in your mind before you

watch this video.

I have definitely assuming that you watched quicksort videos.

And not just watched them but

that you have that material pretty fresh in your mind.

So in particular the video of quicksort about the partition subroutine, so

this is where you take a input ray and you choose a pivot and you do repeated swaps.

You rearrange the array so that everything less then the pivot is to the left of it.

Everything bigger then the pivot is to the right of it.

You should remember that sub routine,

you should also remember the previous discussion about pivot choices.

The idea that the quality of a pivot depends on how balanced a split into two

different sub problems it gives you.

Those are both going to be important.

For the analysis of this randomized linear time selection algorithm I need you to

remember the concepts from probability review part one.

And particular random variables, their expectation, and linearity of expectation.

That said, let's move on and formally define what the selection problem is.

The input is the same as for

the sorting problem, just you're giving it array of indistinct entries.

But in addition, you're told what order statistic you're looking for.

So that's going to be a number I, which an integer between 1 and N.

And the goal is to output just a single number.

Namely the ith order statistic,

that is the ith smallest entry in this input array.

3:03

If the array has even length, there's two possible medians, so

let's just take the smaller of them, that's the n over 2th order statistic.

You might wonder why you'd ever want to compute the median of an array rather than

the mean, that is the average.

It's easy to see you that you can compute the average just with a simple

linear scan.

And the median you can, one motivation is it's a more robust version of the mean.

So if you just have a data entry problem and it corrupts one element of an input

array it can totally screw up the average value of the array, but

it has generally very little impact on the median.

Final comment about the problem is that I am going to assume that the array entries

are distinct, that is there's no repeated elements.

But just like in our discussions of sorting, this is not a big assumption.

I can encourage you to think about how to adapt these algorithms to work even if

the arrays do have duplicate.

You can, indeed, still get the same very practical,

very fast algorithms with duplicate elements.

Now if you think about it, we already have a pretty darn good algorithm that solves

the selection problem.

Here's the algorithm.

It's two simple steps and it runs in o of n log n time.

4:23

So that's pretty cool, we've just done what a computer scientist would

call a reduction and that's a super useful and super fundamental concept.

It's when you realize that you can solve one problem by reducing it to another

problem that you already know how to solve.

So what we just showed is that the selection problem reduces easily

to the sorting problem.

We already know how to solve the sorting problem n log n time so

that gives an n log n time solution to this selection problem.

But again remember the mantra of any algorithm designer worth their salt,

is can we do better.

We should avoid contentedness.

Just because we got nlogn we should stop there.

Maybe can be even faster.

Now certainly we're going to have to look at all the elements in the input array,

in the worst case.

You shouldn't expect to do better than linear, but hey, why not linear time?

5:11

Actually if you think about it, we probably should have asked that question

back when we were studying the sorting problem.

Why were we so content with the end login time bound for merch sort.

And the O of N login time on average bound, for quick sort.

Well it turns out, we have a really good reason to be happy with our N login

upper bounds for the sorting problem.

5:41

So another words, if we insist on solving the selection problem via

a reduction to the sorting problem then we're stuck with this N log N time bound.

Okay, strictly speaking that's for

something called comparison sorts, see the video for more details but

the upshot is if you want a general purpose algorithm.

And we want to do better than N log N for

selection we have to do it using ingenuity beyond this reduction,

we have to prove that selection is a strictly easier problem then sort it.

That's the only way we're going to have an algorithm that beats n log n.

That's the only way we can conceivably get a linear time algorithm.

And that is exactly what is up next on our plates.

We're going to show selection is indeed fundamentally easier than sorting.

We can have a linear time algorithm for it,

even though we can't get a linear time algorithm for sorting.

You can think of the algorithm we're going to discuss as a modification of

quick sort and in the same spirit of quick sort it will be a randomized algorithm.

And the running time will be an expected running time that will hold for

any input array.

6:40

Now, for the sorting problem we know that quick sort that's n

log in time on average, where the average is over the coin flips done by the code.

But we also know that if we wanted to,

we could get a sorting algorithm in n log n time that doesn't use randomization.

The merge sort algorithm is one such solution.

So here, we're giving a linear time solution for

selection, for finding order statistics that uses randomization.

And it would be natural to wonder, is there an analog to merge sort?

Is there an algorithm which does not use randomization, and

gets this exact same linear time down.

In fact there is.

The algorithm's a little more complicated, and

therefore not quite as practical as this randomized algorithm.

But it's still very cool.

It's a really fun algorithm to learn and to teach.

So I will have an optional video about linear time selection

without randomization.

So for those of you who aren't going to watch that video or

want to know what's the key idea.

The idea is to choose the pivot deterministically in a very careful way

using a trick called the median of medians.

That's all I'm going to say about it now you should watch the optional video if you

want more details.

7:54

So what I want to do next is develop the idea that can modify the QuickSort

paradigm in order to directly solve The selection problem.

So to get an idea of how that works, let me review the Partition subroutine.

Like in Quicksort this subroutine will be our workhorse for the selection algorithm.

So, what the Partition subroutine does, it takes as inputs, some jumbled up array and

it's going to solve a problem which is much more modest than sorting.

So in partitioning, it's going to first choose a pivot element somehow.

We'll have to discuss what's a good strategy for choosing a pivot element.

But suppose in this particular input array it chooses the first element,

this three, as the pivot element, the responsibility of the partition

sub-routine then is to rearrange the elements in this array so

that the following properties are satisfied.

Anything less than the pivot is to the left of it and it can be in jumbled order.

But if you're less than pivot you better be to the left like this two and

one is less than three.

If you're bigger than the pivot than again you can be in jumbled order amongst those

elements but all of them have to be to the right of the pivot and

that's true for the numbers four through eight.

They all are to the right of the pivot three in a jumbled order.

So this in particular puts the pivot in its rightful position,

where it will belong in the final sorted array.

And at least for

Quicksort, it enabled us to recursively sort to smaller subproblems.

So this is where I want you to think a little bit about how we should adapt

this paradigm.

So, suppose I told you the first step of our selection algorithm is going to be

choose a pivot and partition the array.

Now the question is, how are we going to recurse?

We need to understand how to find the ith order statistic of the original

input array.

It suffices to recurse on just one sub problem of smaller size,

and find a suitable or a statistic in it.

So how should we do that?

Let me ask you that with some very concrete examples.

About what pivot we choose and what order statistic we're looking for and

see what you think.

So the correct action to this quiz is the second answer.

9:53

So we can get away with recursing just once, and then this particular example,

we're going to recurse on the right side of the array.

And instead of looking for the fifth order statistic like we would originally,

we're going to recursively search for the second order statistic.

So why is that?

Well first why do we recurse on the right side of the array?

So by assumption we have this array of ten elements, we choose the pivot,

we do partitioning, remember the pivot winds up in its rightful position.

That's what partitioning does.

So in the bid it winds up in the third position,

we know it's the third smallest element in the array.

Now that's not what we were looking for.

We were looking for the fifth smallest element in the array.

That, of course, is bigger than the third smallest element of the array.

So by partitioning, where is the fifth element going to be?

It's gotta be to the right of this third smallest element,

to the right of the pivot.

So we know for sure that the fifth order statistic of the original array

lies to the right of the pivot.

That is guaranteed.

So we know where to recurse on the right hand side.

Now, what are we looking for?

We are no longer looking for the fifth order statistic,

the fifth smallest element.

Why?

Well we've thrown out both the pivot and everything smaller than it.

Remember we're only recursing on the right hand side.

So we've thrown out the pivot, the hird element, and everything less than it,

the minimum and the second minimum.

Having deleted the three smallest elements and originally looking for

the fifth smallest of what remains, of what we're recursing on.

We're looking for the second smallest element.

So the selection algorithm in general, is just the generalization of this idea.

So arbitrary arrays and arbitrary situations of whether the pivot comes back

equal to less or bigger than the element you are looking for.

So let me be more precise, I am going to call this algorithm R select for

randomized selection, and according to the problem definition it takes as input,

as usual an array A of some length n.

Then also the order statistic that we are looking for, so we are going to call

that i, and of course we assume that i is some integer between one and inclusive.

So for the base case, that is going to be if the array has size one,

then the only element we could be looking for is the oneth order statistic and

we just return the sole element of the array.

12:10

Now we have to partition the array around the pivot element.

And just like in quick sort, we're going to very lazy about choosing the pivot.

We're going to choose it uniformly at random from the n possibilities, and

hope things work out.

And that will be the crux of the analysis,

proving that random pivots are good enough sufficiently often.

Having chosen the pivot, we now just invoke the standard partitioning and

subroutine.

As usual, that's going to give us the partitioned array.

You'll have the pivot element, you'll have everything less in the pivot to the left,

everything bigger, to the right.

As usual, I'll call everything to the left,

the first parts of the partitioned array.

And everything bigger, the second part.

12:45

Now we have a couple of cases, depending on whether the pivot is bigger or

less then the element we are looking for.

So I need a little notation to talk about that.

So let's let j be the order statistic that p is.

So if p winds up being the third smallest element like in the quiz,

then j's going to be equal to three.

Equivalently we can think of j as defined as the physician of

the pivot in the partition version of the array.

Now there's one case, which is very unlikely to occur, but

we should include it just for completeness.

If we're really lucky, then, in fact,

a random pivot just happens to be the order statistic we were looking for.

That's when i equals j.

We're looking for the ith smallest element.

If by dumb luck the pivot winds up being the ith smallest element, we're done.

We can just return it.

We don't have to recurse.

13:32

Now in general of course,

we don't randomly choose the element we are looking for.

We choose something that, that could be bigger or could be smaller then it.

In the quiz we chose a pivot that was smaller then what we were looking for.

Actually, that's the harder case.

So, let's first start with a case,

where the pivot winds up being bigger then the element we were looking for.

So that means that j is bigger than i.

We're looking for the i smallest.

We randomly chose the j smallest for j bigger than i.

So this is the opposite case of the quiz.

This is where we know what we're looking for has to be to the left of the pivot.

The pivot's the j smallest everything less than is to the left.

We're looking for the i smallest, i is less than j, so

that's got to be on the left.

That's where it recurs.

Moreover it clear we're looking for exactly the same order statistic.

If we're looking for the third smallest element, we're only throwing out stuff

which is bigger than something even bigger hthan the third smallest element so

we're still looking for the third smallest of what remains.

And naturally the new array size is j minus 1 because that's what's to

the left of the pivot.

And then finally, the final case is when the random element that we choose is less

than what we're looking for and then we're just like the quiz.

Namely what we're looking for is bigger than the pivot.

It's got to be in the right-hand side.

We know we've got a recurse in the right-hand side.

Whenever the right-hand side has n minus j elements,

we throw out everything up to the pivot.

So we throw out j things.

There's n minus j left.

All of those j things we threw out are less than what we're looking for.

So if we used to be looking for the i smallest element now we're looking for

the i minus j smallest element.

14:56

So that is the whole algorithm.

That is how we adopt the approach we took to the sorting problem in quick sort and

adapt it to the problem of selection.

So, is this algorithm any good?

Let's start studying its properties and understand how well it works.

So let's begin with correctness.

So the claim is that, no matter how the algorithm's coin flips come up,

no matter what random pivots we choose, the algorithm is correct.

In the sense that it's guaranteed to output the ith order statistic.

The proof is by induction.

It proceeds very similarly to quick sort.

So I'm not going to give it here.

If you're curious about how these proofs go,

there's an optional video about the correctness of quick sort.

If you watch that and understand it, it should be clear how to adapt

that inductive argument to apply to this select algorithm as well.

15:39

So as usual for divide and conquer algorithms, the interesting part is not so

much knowing, understanding why the algorithm works, but

rather understanding how fast it runs.

So the big question is, what is the running time of this selection algorithm?

Now, to understand this we have to understand the ramifications of

pivot choices on the running time.

So you've seen the QuickSort videos they're fresh in your mind so

what should be clear is that just like in QuickSort how

fast this algorithm runs is going to depend on how good the pivots are and

what good pivots means is pivots that guarantee a balanced split.

16:29

So the correct answer to this question is exactly the same as the answer for

QuickSort.

The worst case running time, if the pivots are chosen just in a really unlucky way.

Is actually quadratic in the array length.

Remember, we're shooting for linear time.

So this quadratic is a total disaster.

So how could this happen?

Well suppose you're looking for the median, and

suppose you choose the minimum element as the pivot every single time.

So if this is what happens, if every time you choose a pivot to be the minimum,

just like in QuickSort, this means every time you recurse,

all you succeed in doing is peeling off a single element from the input array.

Now, you're not going to find the median element until you've roughly

n over 2 recursive cause,

each on an array that has size at least a constant fraction of the original one.

So it's a linear number of recursive calls,

each on an array of size at least some constant times n.

So that gives you a total running time of quadratic overall.

Of course, this is an absurdly unlikely event.

Frankly, your computer is more likely to be struck by a meteor than it is for

the pivot to be chosen as the minimum element in every recursive call.

But if you really have an absolutely worst case choice of pivots,

it would give this quadratic run time down.

17:54

So the key to a fast running time is going to be the usual property that we want to

see in the divide and conquer algorithms, namely every time that recurse the problem

size better not just be smaller but it better be smaller by a significant factor.

How would that happen in the selection approach based on the partition

subroutine?

Well if both of the sub-problems are not too big,

then we're guaranteed that when we recurse we make a lot of progress.

So let's think about what the best possible pivot would be

in the sense of giving a "balanced" split, right, so of course in some sense the best

pivot is you just choose your statistic group you're looking for.

Then you're done in constant time.

But that's extremely unlikely, and it's not worth worrying about.

So ignore the fact that we might guess the pivot.

What's the best pivot if we want to guarantee an aggressive

decrease in the input side for the next iteration.

Well, the best pivot is the one that gives as most balanced split as possible.

So what's the pivot that gives us the most balanced split?

A 50/50 split. If you think

about it it's exactly the median.

Of course, this is not super-helpful,

because the median might well be what we're looking for in the first place.

So this is sort of a circular idea.

But for intuition, it's still worth exploring what kind of running time we

would get in the best case, right?

If we're not going to get linear time even in this magical best case,

we certainly wouldn't expect to get it on average over random choices of the pivots.

So what would happen if we actually did luckily choose the median as the pivot

every single time?

Well we get the recurrence that the running time that

the algorithm requires at a rate of length n.

Well, there's only going to be one recursive call.

So this is the big difference from QuickSort

where we had to recurse on both sides and we had two recursive calls.

So here, we're only going to have one recursive call.

In the magical case where our pivots are always equal to the median,

both sub-problem sizes are only half as large as the original one.

So when we recurse, it's on a problem size guaranteed.

Could be at most n over two and then outside the recursive call pretty much

all we do is a partitioning invocation, and we know that that is linear time.

So the recurrence we get is T of N is the most T of N over two plus big O of N.

This is totally ready to get plugged into the master method.

It winds up being two of the master method and

indeed we get exactly what we wanted, linear time.

20:07

To reiterate this is not interesting in its own right.

This is just for intuition.

This was a sanity check that at least for a best case choice of pivots

we'd get what we want, the linear time algorithm and we do.

Now, the question is how well do we do with random pivots?

Now the intuition, the hope is exactly as it was for QuickSort which is the random

pivots are perfectly good surrogate for the median, for the perfect pivot.

20:31

So having the analysis of Quicksort under our belt where indeed random pivots

do approximate very closely to the performance you get with best case pivots

maybe now we have reason to believe that this is hopefully true.

That said, as a mathematical statement this is totally not obvious and

it's going to take a proof.

That's the subject for the next video.

Let me just be clear exactly what we're claiming.

Here is the running time guarantee the random Rselection provide.

20:57

For an arbitrary input array of input length n,

the average running time of this randomized selection is linear.

Big O of n. Let me reiterate a couple of points I made

for the analogous guarantee for the QuickSort algorithm.

The first is that we're making no assumptions for data whatsoever.

In particular we're not assuming that the data is random.

This guarantee holds,

no matter what input array you feed into this randomized algorithm.

In that sense, this is a totally general purpose subroutine.