This is the second video of three, in which we prove that the average running time of randomized quicksort is big O of n log n. So, to remind you of the formal statements. So again we're thinking about quicksort where we implement the choose pivot sub routine to always choose a pivot uniformly at random from the sub array that it gets passed. And we're proving that for a worst case input array for an arbitrary input array of length n, the average running time of quicksort where the average is over the random pivot choices is big O of n log n. So let me remind you of the story so far. This is where we left things at the previous video. We defined a few random variables. The sample space, recall, is just the, all of the different things that could happen, that is all of the random coin flip outcomes that quicksort could produce. Which is equivalent to all of the pivot choices made by quicksort. Now, the random variables we care about. So first of all, there is C. Which is the number of comparisons between pairs of elements in the input array that quicksort makes for a given pivot sequence sigma. And then there are the xij's. And so that's just meant to count the number of comparisons involving the ith smallest and the jth smallest elements in the input array. Where you recall that zi and zj denote the ith smallest and jth smallest entries in the array. Now because every comparison involves some zi and some zj we can express C as a sum over the xij's. So we did that in the last video, we applied linearity at expectation, we used the fact that xij are zero one, that is indicator random variables to denote that to write the expectation of an xij just as the probability that it's equal to one and that gave us the following expression. So the key insight and really the heart of the quicksort analysis is to derive an exact expression for this probability as a function of i and j. So for example if the third smallest element in the array, the seventh smallest element in the array. Wherever they may be scattered in the input array we want to know exactly what's the probability that they get compared at some point in the execution of quicksort. And we're going to get a extremely precise understanding of this probability in the form of this key claim. So for all pairs of elements, and again, ordered pairs. So we're thinking of i being less than j. The probability that zi and zj get compared at some point in the execution of quicksort is exactly 2 divided by j- i + 1. So for example in this example of the third smallest element and the seventh smallest element, it would be exactly 40% of the time, two over five is how often those two elements would get compared if you ran quicksort with a random choice of pivots, and that's going to be true for every j and i. The proof of this key claim is the purpose of this video. So how do we prove this key claim? How do we prove that the probability that zi, zj get compared is exactly 2 over quantity j- i +s 1. Well fix your favorite ordered pair, so fix elements zi, zj with i less than j, for example the third smallest and the seventh smallest element in the array. Now, what we want to reason about is the set of all elements in the input array between zi and zj inclusive. And I don't mean between in terms of positions in the array, I mean between in terms of their values. So consider the set between zi and zj + 1 inclusive, so zi, zi + 1,... Zj- 1, Zj. So for example, the third, fourth, fifth, sixth and seventh smallest elements in the input array. Wherever they may be, okay. And of course, the initial array is not sorted, so there's no reason to believe that these j minus i plus 1 elements are contiguous, okay. They're scattered throughout the input array. But we're going to think about them, okay, zi through zj inclusive. Now throughout the execution of quicksort, these j minus i plus 1 elements lead parallel lives at least for awhile in the following sense. Begin with the outermost call to quicksort and suppose that none of these j minus i plus 1 elements is chosen as a pivot. Where then could the pivot lie? Well it can only be a pivot that's greater than all of these or it could be less than all of these. For example if this is the third fourth, fifth, sixth and seventh smallest elements in the array, well the pivot is either the minimum or the second minimum in which case it's smaller than all five elements, or it's the eighth or largest, or larger elements in the array in which case it's bigger than all of them. There's no way you're going to have a pivot that somehow is wedged in between this set because this is a contiguous set of order statistics, okay. Now what do I mean by these elements leading parallel lives? Well, in the case where the pivot is chosen to be smaller than all of these elements, then all of these elements will wind up to the right of the pivot. And they will all be passed to a common recursive call. The second recursive call. If the pivot is chosen to be bigger than all of these elements, then they'll all show up on the left side of the partitioned array. And they'll all be passed to the first recursive call. Iterating this or proceeding inductively, We see that as long as the pivot is not drawn from the set of j minus i plus 1 elements. This entire set will get passed on to the same recursive call. So these j minus i plus 1 elements are living blissfully together in harmony until the point in which one of them gets chosen as a pivot. And that, of course, has to happen at some point. The recursion only stops when the array length is equal to zero or one. So, if for no other reason at some point there will be no other elements in a recursive call other than these j minus i plus 1, okay. So at some point, the reverie is interrupted and one of them is chosen as a pivot. So let's pause the quicksort algorithm and think about what things look like at the time when one of these j minus i plus 1 elements is first chosen as a pivot element. There are two cases worth distinguishing between. In the first case the pivot happens to be either zi or zj. Now remember what it is we're trying to analyze. We're trying to analyze the frequency, the probability with when zi and zj gets compared. Well, if zi and zj are in the same recursive call, and one of them gets chosen as the pivot, then they're definitely going to get compared. Remember when you partition an array around this pivot element, the pivot get's compare to everything else. So if zi's chosen as a pivot, it certainly get's compare to zj. If zj gets chosen as a pivot, it gets compared to zi. So either way if one of these two is chosen, they're definitely compared. If, on the other hand, the first of these j minus i plus 1 elements to be chosen as a pivot is not zi or zj. If, instead, it comes from the set zi plus 1, so on, up to zj minus 1. Then the opposite is true. Then zi and zj are not compared now. Nor will they ever be compared in the future. So why is that? Well that requires two observations. First recall that when you choose a pivot and you partition an array, all of the comparisons involve the pivot. So two elements neither of which is the pivot do not get compared in a partition sub routine. So they don't get compared right now. Moreover, since zi is the smallest of these and Zj is the biggest of these, and the pivot comes from somewhere between them. This choice of pivot will split zi and zj into different recursive calls, zi gets passed to the first recursive call, zj gets passed to the second recursive call and they will never meet again. So there's no comparison's in the future, either. So these two observations right here I would say is the key insight in the quicksort analysis. The fact that for a given pair of elements we can very simply characterize exactly when they get compared and when they do not get compared in the quicksort algorithm. That is they get compared exactly when one of them is chosen as the pivot before any of the other elements with value in between those two has had the opportunity to be a pivot. That's exactly when they get compared. So this will allow us to prove this key claim, this exact expression on the comparison probability. That will plug into the formula we had earlier and will give us the desired bound on the average number of comparisons. So let's fill in those details. So first let me rewrite the high order bit from the previous slide. So now at last we will use the fact that our quicksort implementation always chooses a pivot uniformly at random. That each element of a sub array is equally likely to serve as the pivot element in the corresponding partition call. So what does this buy us? This just says all of the elements are symmetric. So each of the elements zi, zi plus 1, all the way to zj, is equally likely to be the first one asked to serve as a pivot element. Now the probability that zi and zj get compared is simply the probability that we're in case one, as opposed to in case two. And since each element is equally likely to be the pivot, that just means there's sort of two bad cases, two cases in which one can occur out of the j minus i plus 1 possible different choices of pivot. Now we're talking about a set of j minus i plus 1 elements. Each of whom is equally likely to be asked to be served first as a pivot element. And the bad case, the case that leads to a comparison, there's two different possibilities for that. It was zi or zj is first. And the other j minus i minus 1 outcomes lead to the good case where zi and zj never get compared. So overall, because everybody's equally likely to be the first pivot, we have that the probability with zi and zj get compared. Is exactly the number of pivot choices that lead to comparison, divided by the number of pivot choices overall. And that is exactly the key claim. That is exactly what we asserted was the probability that a given zi and zj get compared for no matter what i and j are. So, wrapping up this video, where does that leave us? We can now plug in this expression for this probability of comparison probabilities. Into the double sum that we had before. So putting it all together what we have is that what we really care about the average number of comparisons that quicksort makes on this particular input of array n, of length n is just this double sum which iterates over all possible ordered pairs ij. And what we had here before was the probability of comparing zi and zj we now know exactly what that is so we just substitute. And this is where we're going to stop for this video so this is going to be our key expression star which we still need to evaluate but that's going to be the third video. So essentially we've done all of the conceptual difficulty in understanding where comparisons come from in the quicksort algorithm. All that remains is a little bit of an algebraic manipulation to show that this starred expression really is big O log n. And that's coming up next.