This optional video will be, more or less, the last word that we have on sorting for the purposes of this course. And it'll answer the question, can we do better? Remember, that's the mantra of any good algorithm designer. I've shown you N log N algorithms for sorting, Merge Short in the worst case, Quick Sort, on average. Can we do better than N log N? Indeed, for the selection problem, we saw we could do better than N log N. We could linear time. Maybe we can do linear time for sorting as well. The purpose of this video is to explain to you why we cannot, do sorting, in linear time. So this is a rare problem where we understand quite precisely how well it can be solved at least for a particular class of [inaudible] called comparison based sorts which I'll explain in a moment. So here's the form of the theorem, I want to give you the gist of in this video. So in addition to restricting to comparison based sorts which is necessary as we'll see in a second, I'm going to make a second assumption which is not necessary but is convenient for the lecture which is that I'm only going to think about deterministic algorithms for the moment. I encourage you to think about why the same style of arguments gives an N log and lower bound on the expected running time of any randomized algorithm. Maybe I'll put that on the course site as an optional theory problem. So, in particular, a quick sort is optimal in the randomized sense. It have average and long end time and then again claims that no comparison based sort can be better than that, even on average. So, I need to tell you what I mean by a comparison based sorting algorithm. What it means, it's a sorting algorithm that accesses the elements of the input array. Only via comparisons, it does not do any kind of direct manipulation on a single array element. All it does, is it picks pairs of elements and asks the question is the left one bigger or is the right one bigger. I like to think of comparison based sorts as general purpose sorting routines. They make no assumptions about what the data is other than that it's from some totally ordered set. I like to think of it really as a function that takes as an argument a function pointer that allows it to do comparisons between abstract data types. There's no way to access the guts of the elements. All you can do is go through this API, which allows you to make comparisons. And indeed if you look at the sorting routine and say the unit's operating system, that's exactly how it's set up. You just patch in a function pointer to a comparison operator. I know this sounds super abstract so, I think it becomes clear once we talk about some examples. There's famous examples of comparison based sort including everything we've discussed in the class so far. There's also famous examples of non comparison based sort which we're not gonna cover, but perhaps some of you have heard of or at the very least they're very easy to look up on Wikipedia or wherever. So examples include the two sorting algorithms we discussed so far, mergesort. The only way that mergesort interacts with the elements in the input array is by comparing them and by copying them. Similarly, the only think Quick Sort does with the input array elements is compare them and swap them in place. For those of you that know about the heap data structure which we'll be reviewing later in the class. Heap sort. Where you just, heapify a bunch of elements, and then extract the minimum N times. That also uses only comparisons. So what are some famous non examples? I think this will make it even more clear what we're talking about. So bucket sort is one very useful one. So, bucket sort's used most frequently when you have some kind of distributional assumption on the data that you're sorting. Remember that's exactly what I'm focusing on avoiding in this class. I'm focusing on general purpose subroutines where you don't know anything about the data. If you do know stuff about the data, bucket sorting can sometimes be a really useful method. For example, suppose you can model your data as I-I-D samples from the uniform distribution on zero one. So they're all rational numbers, bigger than zero, less than one, and you expect them to be evenly spread through that interval. Then what you can do in bucket sort is you can just. Preallocate end buckets where you're gonna collect these elements. Each one is gonna have the same width, width one over n. The first bucket you just do linear pass with the input array. Everything that's between zero and one over n you stick in the first bucket. Everything in between one over n and two over n you stick in the second bucket. Two over end and three over n you sick in the third bucket and so on. So with the single pass. You've classified the input elements according to which bucket they belong in, now because the data is assumed to be uniform at random, that means you expect each of the buckets to have a very small population, just a few elements in it. So remember if it. Elements are drawing uniform from the interval zero one, then it's equally likely to be in each of the N available buckets. And since there's N elements that means you only expect one element per bucket. So that each one is gonna have a very small population. Having bucketed the data, you can now just use, say, insertion sort on each bucket independently. You're gonna be doing insertion sort on a tiny number of elements, so that'll run in constant time, and then there's gonna be linear number of buckets, so it's linear time overall. So the upshot is. If you're willing to make really strong assumptions about your data like it's drawn uniformly at random from the interval zero one then there's not an N log in lower bound in fact you can allude the lower bound and sort them in your time. So, just to be clear. In what sense is bucket sort not comparison based? In what sense does it look at the guts of its elements and do something other than access them by pairs of comparisons? Well, it actually looks at an element at input array and it says what is its value, and it checks if its value is.17 versus.27 versus.77, and according to what value it sees inside this element, it makes the decision of which bucket to allocate it to. So, it actually stares at the guts of an element to decide how, what to do next. Another non-example, which eh, can be quite useful is count and sort. So this sorting algorithm is good when your data again we're gonna make an assumption on the data, when their integers, and their small integers, so they're between zero and K where K is say ideally at most linear in N. So then what you do, is you do a single pass through the input array. Again, you just bucket the elements according to what their value is. It's somewhere between zero and K, and it's an integer by assumption. So you need K buckets. And then you do a pass, and you sort of depopulate the buckets and copy them into an output array. And that gives you a, a sorting algorithm which runs in time, O of N Plus K. Where K is the size of the biggest integer. So the upshot with counting sort is that, if you're willing to assume that datas are integers bounded above by some factor linear in N, proportional to N, then you can sort them in linear time. Again county sort does not access the rail and it's merely through comparisons. It actually stares at an element, figures out what it's value is, and uses that value to determine what bucket to put the element in. So in that sense it's not a comparison case sort and it can under compare it's assumptions to beat the end log and lower it down. So a final example is the one that would [inaudible] them rated sort. I think that this is sort of an extension of counting sort, although you don't have to use counting sort as the interloop you can use other so called stable sorts as well. It's the stuff you can read about in many programming books or on the web. And up shot at rated sort. [inaudible]. You, you again you assume that the date are integers. You think of them in digit representation, say binary representation. And now you just sort one bit at time, starting from the least significant bits and going all the way out to the most significant bits. And so the upside of rating sort, it's an extension of counting sort is the sense that if your data is integers that are not too big, polynomially bounded in N. Then it lets you sort in linear time. So, summarizing, a comparison based sort is one that can only access the input array through this API, that lets you do comparisons between two elements. You cannot access the value of an element, so in particular you cannot do any kind of bucketing technique. Bucket sort, counting sort, and rating sort all fundamentally are doing some kind of bucketing and that's why when you're willing to make assumptions about what the data is and how you are permitted to access that data, that's when you can bypass in all of those cases, this analog and lower value. But if you're stuck with a comparison based sort, if you wanna have something. General purpose. You're gonna be doing n log n comparisons in the worst case. Let's see why. So we have to prove a lower band for every single comparison based sorting method, so a fixed one. And let's focus on a particular input length. Call it N. Okay, so now, let's simplify our lives. Now that we're focused on a comparison based sorting method, one that doesn't look at the values of the array elements just in the relative order. We may as well think of the array as just containing the elements... One, two, three, all the way up to N, in some jumbled order. Now, some other algorithm could make use of the fact that everything is small integers. But a comparison based sorting method cannot. So there's no loss in just thinking about an unsorted array containing the integers [inaudible] N inclusive. Now, depsite seemingly restricting the space of inputs that we're thinking about, even here, there's kind of a lot of different inputs we've gotta worry about, right? So N elements can, can show up, and N factorial different orderings, right? There's N choices for who the first element is, then N-1 choices for the second element, M minus two choices for the third element, and so on. So, there's N factorial for how these elements are, are arranged in the input array. So I don't wanna prove this super formally, but I wanna give you, the gist, I think, the good intuition. Now, we're interested in lower bounding the number of comparisons that, this method makes in the worst case. So let's introduce a parameter K, which is its worst case number of comparisons. That is, for every input, each of these end factorial inputs, by assumption, this method makes no more than K comparisons. The idea behind the proof is that, because we have N factorial fundamentally different inputs, the sorting method has to execute in a fundamentally different way on each of those inputs. But since the only thing that causes a branch in the execution of the sorting method is the resolution of the comparison, and we have only [inaudible] comparisons, it can only have two to the K different execution paths. So that forces two to the K to be at least N factorial. And a calculation then shows that, that forces K to be at least Omega N log N. So let me just quickly fill in the details. So cross all in-factorial possible inputs just as a thought experiment. We can imagine running this method in factorial times just looking at the pattern of how the comparison is resolved. Right? For each of these in-factorial inputs, we run it through this sorting method, it makes comparison number one, then comparison number two, then comparison number three, then comparison number four, then comparison number five, and you know it gets back a zero, then a one, then a one, then a zero. Give in some other input and it gets back a one, then a one, then a zero, then a zero and so on. The point is, for each of these in-factorial inputs, it makes at most K comparisons, we can associate that with a K bit string, and because it. Is there's only K bits we're only going to see two to the K different K-bit strings two to the K different ways that a sequence of comparisons results. Now to finish the proof we are gonna apply something which I don't get to use as much as I'd like in an evident class but it's always fun when it comes up, which is the pigeon-hole principle. The [inaudible] principle you recall is the essentially obvious fact that if you try to stuff K plus one pigeons into just K cubby holes, one of those K cubby holes has got to get two of the pigeons. Okay at least one of the cubby holes gets at least two pigeons. So for us what are the pigeons and what are the holes? So our pigeons are these in factorial different inputs. The different ways you can scramble the images one through. And, what are our holes? Those are the two indicate different executions that the sorting method can possibly take on. Now if. The number of comparisons K used is so small, that two to the K, the number of distinct execution, number of distinct ways comparisons can resolve themselves, is less than the number of different inputs that have to be correctly sorted. Then by the pivotal principal. One Color [inaudible] gets two holes. That is, two different inputs get treated in exactly the same way, by the sorting method. They are asked, exactly the same k comparisons and the comparisons resolve identically. [inaudible] one. Jumbling of one through N, then you get a 01101 then it's a totally different jumbling of N and then again you get a 01101 and if this happens the algorithm is toast, in the sense that it's definitely not correct, right, cuz we've fed it two different inputs. And it is unable to resolve which of the two it is. Right? So, it may be one premutation of one through N, or this totally different premutation of one through N. The algorithm has tried to learn about what the input is through these K comparisons, but it has exactly the same data about the input in two, the two cases. So, if it outputs the correct sorted version in one case, it's gonna get the other one wrong. So, you can't have a common execution of a sorting algorithm unscramble totally different premutations. It can't be done. So what have we learned? We've learned that by correctness, two to the K is in fact at least in the factorial. So how does that help us? Well, we wanna lower bound K. K is the number of comparisons this arbitrary storing method is using. They wanna show that's at least N log N. So we, to lower bound K, we better lower bound N factorial. So, you know, you could use Stirling's Approximation or something fancy. But we don't need anything fancy here. We'll just do something super crude. We'll just say, well, look. This is the product of N things, right? N times N minus one time N minus two, blah, blah, blah, blah. And the largest of those, the N over two largest of those N terms are all at least N over two. The rest we'll just ignore. Pretty sloppy, but it gives us a lower bound of N divided by two raised to the N divided by two. Now we'll just take log base two of both sides, and we get the K is at least N over two, log base two of N over two, also known as omega of N log N. And that my friends is why a heretics deterministic sorting algorithm that's comparison based has gotta use N log N comparisons in the worst case.