In this video I'll explain the mathematical analysis of the randomized
linear time selection algorithm that we studied in the previous video.
Specifically, I'm going to prove to you the following guarantee for that
algorithm. For every single input array of length n the running time of this
randomized selection algorithm on average will be linear. Pretty amazing if you
think about it because that's barely more what the time it takes just to read the
input. And in particular this linear time algorithm is even faster than sorting.
So this shows that selection is a fundamentally easier problem than sorting.
You don't need to reduce to sorting. You can solve it directly in O(n) time.
I want to reiterate the same points I made about quick sort. The
guarantee is the same. It is a general purpose subroutine. We make no assumptions
about data. This theorem holds no matter what the input array is. The expectation,
the average that's in the theorem statement is only over the coin flips made
by the algorithm made inside it's code of our own devising. Before we plunge into the
analysis, let me just make sure you remember what the algorithm is. So it's
like quick sort. We partition around a pivot except we only recurse once, not
twice. So we're given an array with some length n. We're looking for the ith order
statistic, the ith smallest element. The base case is obvious. You're not in the
base case; you choose a pivot p, uniformly at random from the input array just like
we did in quick sort. We partition around the pivot just like we did in pic, in
quick sort. That splits the array into a first part of those elements less than the
pivot and the second part of those elements which are bigger than the pivot.
Now, we have a couple of cases. The case which is very unlikely so we don't really
worry about, if we're lucky enough to guess the pivot as the ith order
statistic what we're looking for. That's when the new position j. Of the pivot
element happens to equal I. What we're looking for. Then, of course, we just
return it. That was exactly what we wanted. In the general case, the pivot is
going to be in the position J, which is either bigger than what we're looking for
I, that's when the pivot is too big or J. It's position will be less than the ith order statistic
we're looking for. That's when the pivot is too small. So if the pivot's too big,
if J is bigger than i that when we're looking for is on the left hand
side amongst the elements less than the pivot. So that's where we recurse. We've
thrown out both the pivot and everything to the right of it. That leaves us with an
array of J minus I elements and we're still looking for the ith smallest among
these J minus1 smallest elements. And then the final case, this is what we went
through in the quiz and last video, is if we choose a pivot who's smaller than what
we're looking for, that's when J is less than I, then it means we're safe to throw
out the pivot and everything less than it. We're safe recursing on the second part of
those elements bigger than the pivot. Having thrown out the J's smallest
elements, we're recursing on an element of length of N-J and we're looking for the i-j
smallest element in those that remain, having already thrown out the J
smallest from the input array. So that's randomized selection. Let's discuss why
it's linear time on average. The first thought that you might have, and this
would be a good thought, would be that we should proceed exactly the same way that
we did in Quick Sort. You recall that when we analyzed Quick Sort, we set up these
indicator random variables, x, i ,j determining whether or not a given, pair
of elements got compared at any point in the algorithm. And then we just realized
the sum of the comparisons is just the sum, overall, of these x,i, js. We applied
linearity of expectation and it boiled down to just figuring out the probability
that a given pair of elements gets compared. You can analyze this randomized
selection algorithm in exactly the same way. And it does give you a linear time
bound on average. But it's a little messy. It winds up being not quite as clean as in the
quick sort analysis. Moreover, because of the special structure of the selection
problem, we can proceed in an even more slick way here than the way we did with
quick sort. So, again we'll have some constituent random variables. We'll again
apply linearity of expectation but the definition of those random variables is
going to be a little bit different than it was in quick sort. So, first a preliminary
observation. Which is that the workhorse for this randomized selection procedure is
exactly the same as it was in quick sort. Namely it's the partition subroutine.
Essentially all of the work that gets done outside of the recursive call just
partitions the input array around some pivot element as we discussed in detail in a
separate video that takes linear time. So usually when we say something's linear
time we just use big O notation. I'm gonna go ahead and explicitly use a constant c
here for the operations outside the recursive call. That'll make it clear that
I'm not hiding anything up my sleeves when we do the rest of the analysis. Now what I
wanna do on this slide is introduce some vocabulary, some notation which will allow
us to cleanly track the progress of this recursive selection algorithm. And by
progress I mean. The length of the array on which is currently operating.
Remember we're hoping for a big win over quick sort, cuz here we only do one
recursive call, not two. We don't have to recurse on both sides of the pivot just on
one of them. So it stands to reason, that we can think about the argument making
more and more progress as a single recursive calls operating on arrays of
smaller and smaller length. So the notion that will be important for this proof is
that of a phase. This quantifies how much progress we've made so far, with higher
numbered phases corresponding to more and more progress. We'll say that the
r select algorithm at some midpoint of its execution is in the middle of phase J.
If the array size that the current recursive call is working on has length between
3/4th raised to the J times N and the smaller number 3/4th J+1 times N. For example
think about the case where J equals zero. That says phase zero recursive calls,
operate on arrays with size of N and 75 percent of N. So, certainly, the outermost
recursive call is going to be in phase zero. Because the input array has size N.
And then, depending on the choice of the pivot, you may or may not get out of phase
zero in the next recursive call. If you choose a good pivot, and you wind up
recursing on something, that has, at most, 75 percent of the original elements, you
will no longer be in phase zero. If you recurse on something that has more than 75
percent of what you started with, of the. Input array, then you're still gonna be in
phase zero even in the second recursive call. So overall the phase number J,
quantifies the number of times we've made 75 percent progress, relative to the
original input array. And the other piece of notation that's going to be important
is what I'm going to call Xj. So for a phase J, Xj simply counts the number of
recursive calls in which a randomized selection algorithm is in phase J. So this
is gonna be some integer. It could be as small as zero, if you think about it, for
some of the phases. Or it could be larger. So why am I doing this? Why am I making
these definitions of phases and of these XJs? What's the point? We're gonna
remember the point was we wanna be able to cleanly talk about the progress that the
randomized selection algorithm makes through its recursion, and what I wanna
now show you is that in terms of these XJs, counting the number of iterations in
each phase, we can derive a relatively simple upper bound on the number of
operations that our algorithm requires. Specifically the running time of our
algorithm, can be bounded above by the running time in a given phase, and then
summing those quantities over all of the possible phases. So we're gonna start with
a big sum, over all the phases J. We want to look at the number of recursive calls
that we have to endure in phase J, so that's XJ by definition. And then we look
at the work that we do outside of the recursive calls in each recursive call
during phase J. Now, in a given recursive call, outside of its recursive call, we do
C times M operations where M is the length of the input array and during phase J we
have an upper bound on the link of the input array. By definition it's at most
three quarters raised to the J times N. So that is, we multiply the running time
times this constant C this, we inherit from the partition subroutine and then we
can, for the input length, we can put an upper bound of three quarters raised to
the J times N. So just to review where all of these terms come from, there's three
quarters J times N is an upper bound on the array size. During phase J, this by
the definition of the phase. Then, if we multiply that times c, that's the amount
of work that we do on each phase J sub-problem. How much work do we do in
phase J overall or we just take the work per sub problem that's what's circled in
yellow and we multiply it times the number of such sub problems we have. And, of
course, we don't wanna forget any of our sub problems so we just make sure we sum
all of our phases, j, to insure that at every point we count the work done in each
of the sub problems. Okay? So, that's the upshot of this slide. We can upper bound
the running time of our randomized algorithm very simply in terms of phases
and the XJ's, the number of sub problems that we have to endure during phase j. So,
this upper bound on our running time is important enough to give it notation, we'll
call this star, this will be the starting point of our final derivation when we
complete the proof of this theorem. Now don't forget, we're analyzing a randomized
algorithm so therefore the left hand side of this inequality the running time of r select,
that's a random variable. So that's a different number depending on the
outcome of the random coin flips of the algorithm. Depending on the random pick it
has chosen, you will get different random running times. Similarly the right hand
side of this inequality. Is also a random variable. That's because the X J's are
random variables. The number of sub problems in phase j depends on which
pivots get chosen. So. To analyze, what we care about is the expectations of these
quantities, their average values. So we're gonna start modestly and as usual, this
will extend our modest accomplishments to much more impressive ones using linearity
of expectation, but our first modest goal is just to, to understand the
average value. Of an XJ, the expected value of XJ. We're gonna do that in two
steps. On the next slide, I'm going to argue that to analyze the expectation of
XJ, it's sufficient to understand the expectation of a very simple coin flipping
experiment. Then, we'll analyze that coin flipping experiment. Then we'll have the
dominos all set up in a row. And on the final slide, we'll knock'em down and
finish the proof. So let's try to understand the average number of recursive
calls we expect to see in a given phase. So, again, just so we don't forget. Xj is
defined as the number of recursive calls during phase J. Where a recursive call is
in phase J, if and only if the current sub array length lies between three-fourths
raised to the J+1 times N. And then, the larger number of three-fourths raised to
the J times N. So again, for example, phase zero is just the recursive calls
under which the array length is between 75 percent of the original element and 100
percent of the original elements. So what I wanna do next is point out that a very
simple sufficient condition guarantees that we'll proceed from a given phase onto
the next phase. So it's a condition guaranteeing termination of the current
phase. And it's an event that we've discussed in previous videos. Mainly that
the pivot that we choose gives a reasonably balanced split. 25-75 or
better. So recall how partitioning works, we choose a pivot P. It winds up wherever
it winds up. And the stuff to the left of it's less than P. The stuff to the right
of it is bigger than P. So 25 to 75 split or better, what I mean is that each of
these, each, the first part and the second part has, at most, 75 percent of the
elements in the input array. Both have twen-, both have at least 25%, and, at
most, 65%. And the key point is, that if we wind up choosing a pivot that gives us
a split that's at least as good the current phase must end. Why must the
current phase end? Well, to get a 25, 75 split or better than no matter which case
we wind up in, in the algorithm we're guaranteed to recurse on a sub problem that
has at most 75 percent of what we started with. That guarantees that whatever phase
we're in now, we're going to be in an even bigger phase when we recursed. Now, I want
you to remember something that we talked about before, which is that you've got a
decent chance when you pick a random pivot of getting something that gives you a 25,
75 split or better. In fact, the probability is 50 percent. Right? If you
have an array that has the integers from one to 100 inclusive, anything from 76 to
s, 26 to 75 will do the trick. That'll insure that at least the first 25 elements
are excluded from the rightmost call and at least rightmost 25 elements are
excluded from the left recursive call. So this is why we can reduce our analysis of
the number of recursive calls during a given phase, to a simple experiment
involving flipping coins. Specifically, the expected number of recursive calls.
Now we are gonna see in a given phase J, is no more than the expected number of
coin flips in the following experiment. Okay, so you've got a fair coin, 50
percent heads, 50 percent tails. You commit to flipping it until you see the
head and the question is, how many coin flips does it take up to and including the
first head that you see? So the minimum it's gonna be one coin flip if you hit a
head the first time it's one. If you get a tails and then a head, then it's two. If
it's tails, tails, head it's three and so on, and you always stop when you hit that
first head. So what's the correspondence? Well, think of heads as being you're in
Phase J, and if you get a good pivot, it gives you a 25/75 split. Call that heads.
And it guarantees that you exit this Phase J. Just like it guarantees that you get to
terminate the coin flipping experience, experiment. Now, if you get a pivot which
doesn't give you a 25/75 split, you may or may not pass to a higher Phase J, but in
the worst case, you don't. You stick to phase J is you get a bad split, and that's
like getting a tails in the coin flipping experiment, and you have to try again.
This correspondence give us a very elementary way to think about the progress
that, that our randomized selection algorithm is making. So, there's one
recursive call in every step in our algorithm, and each time we either choose
a good pivot or a bad pivot, both could happen, 50-50 probability. A good pivot
means we get a 75-25 split or better. A bad pivot means, by definition, we get a
split worse than 25-75. So what have we accomplished? We've reduced the task of
upper bounding the expected number of recursive calls in a phase J to
understanding the expected number of times you have to flip a fair coin before you
get one hit. So on the next slide we'll give you the classical and precise answer
to this coin flipping experiment. So, let me use capital N to denote the random
variable, which we were just talking about, the number of coin flips you need
to do before you see the first heads. And, it's not very important, but you should
know that these random variables have their own name. This would be a geometric
random variable with parameter one-half. So you can use a few different methods to
compute the expected value of a geometric random variable such as this, and brute
force using the definition of expectation works fine as long as you know how to
manipulate infinite sums. But for the sake of variety, let me give you a very sneaky
proof of what it's expectation is. So the sneaky approach is to write to the
expected value of this random variable in terms of itself and then solve for the
unknown, solve for the expectation. So let's think about it. So how many coins
flips do you need? Well for sure you're gonna need one. That's the best case
scenario. And now two things can happen, either you get heads and that has 50
percent probability you stop or you get tails that happens with 50 percent
probability and now you start all over again. Again you just put points until you
get first heads. On average how many times does that take. Well by the definition of
capital N you expect. The expectation of N coin flips, in the case where you get
tails, and you have to start all over. So this one represents the first coin flip,
the one-half is the probability that you can't stop, that you have to start all
over probability of tails, and then because it's a memory less process,
because when you start anew on the second coin flip having gotten the tails, it's as
if you're back at time one all over again. So now we have a trivial equation, in
terms of the unknown expected value of N and the unique solution, the unique value,
that the expected value of capital N could have, in light of this equation, is two.
So, on average if you flip a fair coin and stop when you get heads, you're going to
see two coin flips on average. To make sure you haven't sort of lost the forest
for the trees, let me remind you why we were talking about this coin flipping
analysis in the first place. So recall in the previous slide we showed that XJ, and
remember XJ is the number of recursive calls you'd expect to see in a given phase
J, and we argued that the number of recursive calls you're gonna see is bounded
above. By the expected number of coin flips until the heads. So this exact
calculation of two for the coin flips gives us an upper bound of two for the
number of recursive calls on average in any given phase J. So now that we've got
all our ducks lined up in a row, let's wrap up the proof on this final slide. So,
inherited from part one of the proof, we have an upper bound. On the expected
running time. Of the R select algorithm. This is what we were calling star on the
first brief slide In star, it looked a little messy, but we had the sum over the
phases J. But we had two things that were independent of j: the constant c and the
original input length n, so let me just yank the c and the n out front. And then
we have this residual sum over the phases J. Of three quarters raised to the J
remember that comes from our upper bound on the sub problem size during phase J and
then of course we have to keep track of how many phase J sub problems we have
solved that by definition is XJ. Now star was written as a rand in accordance terms
to the random variables. Now we're gonna go ahead and take the expectations and
again I have said this over and over but don't forget where's the expectation come
from. This is over the random pivot choices that our code makes. So the
expected running time of the algorithm is most the expectation of this start
quantity. So like I said earlier, pretty much every time we're gonna do any
analysis of [inaudible] process, we're gonna wind up using linearity of
expectation at some point. Here is where we do it. Linear expectation says the
expectation of a sum is just the sum of the expectations. So we yank the c and the
n outside of the expectation. We yank this sum over phases. Outside of the
expectation. We yank this three-fourths raised to the J outside of the expectation
and then we just have the expected value of XJ, the average number of recursive
calls we expect to see in HJ. Now on the previous two slides, we figured out an
upper bound on how many recursive calls we expect to see in each phase. So first by
the coin flip analysis, by the reduction of the coin flip analysis, this is the
most expected number of coin flips N, which on the previous slide, we argued was
exactly two. So bringing that two out in front of the sum, that no longer depends
on J. So we get a most 2CN. Times the sum over phases J, of three quarters raised to
the J. Now this kind of sum we have seen previously in the course. It came up when
we were analyzing the master method and we summed up our running time upper bounds
over the levels of our recursin tree. And if we're not in case one if we're in
case two or three we had geometric sums that were nontrivial. They require a
certain formula to calculate, so let me remind you of that formula here, when the
three quarters are being powered up to the J. So this has value at most, one over one
minus, the number that's getting powered, so in this case it's three quarters, one
minus three quarters is a quarter check reciprocal, you got four. And the upshot
is that the expected number of operations that this randomized selection algorithm
uses to find the [inaudible] ordered statistic in a given input array, is eight
times C times N. Where C is the, hidden constant in the linear running time of
partition. And so that completes the proof. The input array was arbitrary. We
showed the expected running time over the random choices of the algorithm is linear
in N. That is, only a constant factor larger than what is required to read the
input. Pretty amazing.