0:17

Let's take a quick review of where we left off with sorting.

So we considered a number of sorting algorithms,

starting with insertion sort, and then mergesort, quicksort and heapsort.

And we got to the point where we could find an algorithm,

that's heapsort, that guarantees to sort n items in time

proportional to N lg N, without even using any extra space.

Unfortunately, not stable.

And all these algorithms are useful,

for any type of generic key,

as long as it implements the compareTo operation.

And not only that, we proved that any algorithm that just uses compares,

has to use number compares proportional to N lg N.

So in a very important sense, mergesort or

heapsort, for example, are optimal.

You can't use asymptotically fewer compares for either one,

and with heapsort you can't use less extra space.

So why do we consider other sorting algorithms?

1:39

There's a lower bound, why are we thinking about this?

And the question is, can we do better?

And obviously, we're here because the answer is that

we can do better, if we don't depend on compares.

The one assumption made by the lower bound is that we use compares, but

we don't always need to use compares.

And so, let's look at an example.

Key-indexed counting is a fine example of that, and

it's representative of a fairly common situation in a sorting application.

Where it happens to be, that the keys that we're using to sort are small integers.

So in this case, this is supposed to mimic an application where there's students,

and they're assigned to sections,

there's not too many sections, and we want to get the thing sorted.

3:26

Now don't forget that we're sorting according

to a sort key, but usually we're sorting bigger

generic items that have other information associated with them.

If you were just sorting the small integers, you could just count how many

1's there are, how many 2's there are, and like that.

And then in one path, and then if there's three 1's, then just output three 1's, and

so forth.

But the complication is that we have to carry the associated information along,

so we have to work a bit harder than that.

So here's the code for this method called KeyIndexedCounting, and

let's look at a demo.

4:12

So here's the KeyIndexedCounting demo.

Now to make this is a little less confusing, and not so many numbers,

we're going to use lowercase a for 0, b for 1, c for 2, and like that.

So it's the a minus first letter of the alphabet,

however you want to think of it, and we're only going to look at 6.

So we're supposing that we're sorting this array that has six different small

integers.

And we're using lowercase letters to represent the integers, so

that we can easily distinguish between the keys and the indices.

So now let's look at how the processing for this.

So the first thing that we do, is we go through and

we count the frequency of occurrence of each letter.

So the way that we do that, is we keep an array.

Now, the array's actually gotta be one bigger than the number of different

keys that we have, the number of different small integers that we have.

So in this case, array of size seven.

And just to make the code a little cleaner, we keep the number of a's in

count of 1, The number of b's in count of 2, and so forth.

And so once we set up, that's what we want to do,

then it's trivial to go ahead and count the frequencies.

We'd simply go through, for i from 0 to n, we'd go through our input.

And when we access a value in our input, it's a small integer,

so it's 0, 1, 2, 3, 4, or 5.

And we simply add 1 to that integer, and use it to index into the array.

So when we see an a that's 0, then we're incrementing count of 1,

when we see a b that's 1, we're incrementing count of 2, and so forth.

So in this case we increment count corresponding to d,

and then a, and c like that.

So every time we encounter a new key,

we just simply increment one of these counters.

In one pass-through, we get an array that gives us the number of a's,

b's, c's, d's, e's, and f's.

That's the first pas of key-indexed counting,

count the frequencies of each letter, using the key as an index.

6:33

Now the next step is called computing cumulates,

and that's a really easy thing as well.

All we do is we go through the count array, and simply,

at each step, we add the current one to the sum computed so far.

So if we look before, we had two a's and three b's.

So that means that there's five letters less than c, that's the a's and the b's.

And there's six letters less than b, and eight letters less than e, and so forth.

And that's just obtained by, we start with 2, add a 3 to it,

get 5, add 1 to it to get 6.

And with that one pass through the count array, then we can find out,

for example, there's six keys less than b, and eight keys less than e.

And those cumulates tell us where the d's go in the output.

There's 6 keys less than d, and 8 keys less than e,

so the d's have to go in a[6] and a[7].

So this is an array of indices that is going to tell us how to

distribute the keys in the output.

So that's the next step, access the cumulates,

using the key as index to move items.

So let's take a look at,

so now remember when we see an a, we're just going to count that as 0.

So we're going to go to count[0], and that will access this entry in the count array.

So we go through the whole array to be sorted, and we move each key

exactly to where it has to go, and we'll do that one at a time now.

So when i is 0, we're looking at the d.

The count array corresponding to d has 6, so it says,

just put d in there, and increment that.

That means if you have another d, it's going to go into 7.

And these things,

the way we precomputed them are not going to run into one another.

So now a, now that goes in 0, and

we increment the count array corresponding to a.

Next thing is c, and so that says to put it in 5,

and then increment the count array corresponding to c.

And f, let's put it in 9.

Next is b, we put it in 2, sorry, another f, we put in 10.

Next is b, that we put in 2.

So you can see, the keys from the input are getting distributed in the output

according to the counts and the cumulates that we've pre-computed.

So now we get the other d, which goes into 7.

We get another b, which goes into 3, and

an increment of 4 to move where the next one goes.

f goes into 11, the last b goes into 4,

the e goes into 8, and the second a goes into 1.

10:50

And the key fact to note, that it takes time proportional to N + R,

and space proportional to N + R.

Now R, remember the array that sets a number of different character values, so

for ASCII, maybe that's

256, and for genomic data, maybe it's 4.

In N, we're assuming we're sorting huge files, so really,

this is linear time in many, many practical situations.

11:27

There's also the question of, is it stable?

Yeah, it's actually stable.

Because when we do the move,

we move things with equal keys in the order that we see them.

We keep them in the order that we see them, and

that's just the way the method works.

So for this special situation, we have a linear-time,

stable sorting method, which beats the N lg N bound,

and is useful in many practical situations.