0:00

In this next series of videos, we'll get some more practice applying the divide and

conquer algorithm and design paradigm to various problems. This will also give us a

glimpse of the kinds of application [inaudible] to which it's been

successfully applied. We're gonna start by extending the merge sort algorithm to

solve a problem involving counting the number of inversions of an array. Before

we tackle the specific problem of counting the number of inversions in an array, let

me say a few words about the divide and conquer paradigm in general. So again,

you've already seen the totally canonical example of divide and conquer, namely

merge sort. So the following three conceptual steps will be familiar to you.

The first step, no prizes for guessing is you divide. The problem. Into smaller

sub-problems. Sometimes this division happens only in your mind. It's really

more of a conceptual step than part of your code. Sometimes you really do copy

over parts of the input into say new arrays to pass on to your recursive calls.

The second step, again no prizes here, is you conquer the sub-problems just using

recursion. So for example, in Merge Sort, you conceptually divide the array into two

different pieces. And then you [inaudible] with the conquer or sort to the first half

of the array. And you, you do the same thing with the second half of the array.

Now, of course, it's not quite as easy as just doing these two steps. Dividing the

problem, and then solving the sub problems recursively. Usually, you have some extra

cleanup work after the recursive calls, and to stitch together the solutions to

the sub problems into one for the big problem, the problem that you actually

care about. Recall, for example, in Merge Sort, after our recursive calls, the left

half of the array was sorted, the right half of the array was sorted. But we still

had to stitch those together. Merge into a sorted version of the entire array. So the

[inaudible] step is to combine. The solutions to the subproblem into one

problem. Generally the largest amount of ingenuity happens in the third step. How

do you actually quickly combine solutions to subproblems into one to the original

problem? Sometimes you also get some cleverness in the first step with

division. Sometimes it's as simple as just spliting a ray in two. But there are cases

where the division step also has some ingenuity. Now let's move on to the

specific problem of counting inversions and see how to apply this divide and

conquer paradygm. So let begin by defining the problem formally now. We're given as

input an array A with a length N. And you can define the problem so that the array a

contains any ole distinct numbers. But, let's just keep thing simple and assume

that it contains the numbers one through n. The integers in that range in some

order. That captures the essence of the problem. And the goal is to compute the

number of inversions of this array so what's an inversion you may ask well an

inversion is just a pair of array [inaudible] I and J with I smaller than J

so that earlier array entry the I entry is bigger than the latter one the Jake one so

one thing that should be evident is that if the array contains these numbers in

sorted order if the array is simply one two three four all the way up to N then

the number of inversions is zero. The converse you might also want to think

through if the array has any other ordering of the numbers between one and N

other than the assorted one, then it's going to have a non. Of zero number of

inversions. Let's look at another example. So'spose we have an array of six entries.

So the numbers one thru six in the following order. One, three, five followed

by two, four, six. So how many inversions does this array have? So again what we

need to look for are pairs of array entries so that the earlier or left entry

is bigger than the later or right entry. So one example which we see right here

would five and two. Those are right next to each other and out of order, the

earlier entry is bigger than the other one. But there's others, there's three and

two for example Those are out of order. And, five and four are also out of order.

And I'll leave it to you to check that those are the only three pairs that are

out of order. So summarizing the inversions in this array of length six are

3-2, 5-2, and 5-4. Corresponding to the array entries, 2-4, 3-4, and 3-5.

Pictorially, we can think of it thusly, we can first. Write down the numbers in

order, one up to six. And then we can write down the numbers again but, ordered

in the way that their given in the input array. So, one three five two four six.

And then we can connect the dots, meaning we connect one to one. Reconnect two to

two, and so on. It turns out, and I'll leave to for you to, to think this

through, that the number of crossing pairs of line segments prescisely correspond to

the number of inversions. So we see that there are one, two, three crossing line

segments. And these are exactly in correspondence with the three inversions,

we found earlier. Five and two, three and two, and five and four. Now, [inaudible]

wanna solve this problem you might ask. Well there's few reasons that come up. One

would be to have a numerical similarity measure that quantifies how close to

[inaudible] lists are to each other. So for example, suppose I took you and a

friend, and I took, identified ten movies that both of you had seen. And I asked

each of you to order, or to rank these movies from your most favorite to your

least favorite. Now I can form an array, and compute inversions. And it quantifies,

in some sense, how dissimilar your two rankings are to each other. So in more

detail, in the first entry of this array, I would put down the ranking that your

friend gave to your favorite movie. So if you had your favorite movie, Star Wars or

whatever. And your friend only thought it was the fifth best out of the ten, then I

would write down a five in the first entry of this array. Generally, I would take

your second favorite movie. I would look at how your friend ranked that. I would

put that in the second entry of the array and so on, all the way up to the tenth

entry of the array, where I would put your friend's ranking of your least favorite

movie. Now, if you have exactly identical preferences, if you rank them exactly the

same way, the number of inversions of this array would be zero. And in general, the

more inversions this array has, it quantifies that your lists look more and

more different from each other. Now why might you want to do this why might you

want to know whether two different people ranked things in the similar way had

similar preferences well one reason might be what's called collaborative filtering,

probably many of you have had the experience of going to a website and if

you've made a few purchases through this website it starts recommending further

purchases for you, so one way to solve this problem under the hood, is to look at

your purchases look at what you seem to like, find other people who have similar

preferences similar history look at things they've bought that you haven't, and then

recommend. New products to you based on what similar customers seemed to have

bought. So this problem captures some of the essence of identifying which customers

or which people are similar based on data about what they prefer. So just to make

sure we're all on the same page, let me pause for a brief quiz. We've already

noticed that a given array will have zero inversions, if and only if it's in sorted

order. If it only contains the numbers of one through N in order. So, on the other

side, what is the largest number of inversions an array could possibly have?

Let's say, just for an array of size six, like the one in this example here. So the

answer to this question is the first one. Fifteen. Or in general in an N. Element

array the largest number of inversions is N. Choose two. Also known as N times N

minus one over two. Which, again, in the case of a [inaudible] is going to evaluate

to fifteen. The reason is, the worst case is when the array is in backwards order,

reverse [inaudible] order, and every single pair of [inaudible] indices is

inverted. And so the number of indices IJ, with I less than J is precisely

[inaudible] too. Let's now turn our attention to the problem of computing the

number of inversions of an array as quickly as possible. So one option that is

certainly available to us is the brute force algorithm. And by brute force I just

mean we could set up a double four loop. One which goes through I, one which goes

through J bigger than I, and we just check each pair IJ individually with I less than

J whether that particular pair of array entities AI and AJ is inverted and if it

is then we add it to our running count. And then we return the final count at the

end of the double four loop. That's certainly correct. The only problem is, as

we just observed, there's N [inaudible] two or a quadratic number of potential

inversions so this algorithm's almost going to run in time quadratic in the

array link. Now remember the mantra of any good algorithm designer. Can we do better?

And the answer is yes. And the method we'll be using, divide and conquer. The

way in which we'll divide will be motivated directly by merge sort where we

recurs e separately on the left and the right half's of the array. We're gonna do

the same thing here. To understand how much progress we can make purely using

recursion let's classify the inversions of array into one of three types. So suppose

we have an inversion of an array I, J, and remember in an inversion you always have I

less than J. We're gonna call it a left inversion. If both of the array indices

are at most N over two, where N is the array length. We're gonna call it a right

inversion if they're both strictly greater than N over two. And we're gonna call it a

split inversion if the smaller index is at most N over two and the larger index is

bigger than N over two. We can discuss the progress made by recursion in these terms.

When we recurse on the left-half of an array, if we implement our algorithm

correctly, we'll successfully be able to count all of the inversions located purely

in that first half. Those are precisely the left inversions. Similarly, a second

recursive call on just the right half of an array, the second half of an

[inaudible] array will successfully count all of the right inversions. There remains

the questions of how to count the split inversions. But we shouldn't be surprised

there's some residual work left over, even after the recursive calls do their job.

That, of course, was the case at Merge Short, where [inaudible] magically took

care of sorting the left half of the array, sorting the right half of the

array. But there was still, after their return, the matter of merging those two

sorted lists into one. And here again, after the recursion is gonna be the matter

of cleaning up and counting the number of split inversions. So for example if you go

back to the six element array we worked through before, 135246, you'll notice that

there, in fact, all of the inversions are split. So the recursive calls will both

come back counting zero inversions. And all of the work for that example will be

done by the count split inversions subroutine. So let's summarize where

things stand given underspecified high level description of the algorithm as we

envision it. There is a base case. I'll go ahead and write it down for completeness,

which is if we're given a one element array, then there's certainly no inversion

so we can just immediately return the answer zero. For any bigger array, we're

going to divde and conquer. So we'll count the left inversions with a recursive call.

The right inversions with a recursive call. And then we'll have some currently

unimplemented subroutine that counts the split inversions. Since every inversion is

either left or right, or split, and can't be any more than one of those three, then,

having done these three things, we can simply return their sum. So that's our

high level attack on how we're gonna count up the number of inversions. And of

course, we need to specify how we're gonna count the number of split inversions. And

moreover, we lack that subroutine to run quickly. An analogy to emerge short,

where, outside the recursive calls, we did merely linear work. Outs-, in the merge

subroutine. Here, we'd like to do only linear work in counting up the number of

split inversions. If we succeed in this goal, if we produce a correct and linear

time of limitation to count up the number of split incursions, then this entire

recursive algorithm will run in big O. Of N. Log in time. The reason the overall out

rhythm will run in O. Of N. Log in time is exactly the same reason that merge short

ran in N. Log in time. There's two recursive calls. Each on a problem of

one-half the size. And outside of the recursive calls we would be doing linear

work. So you could copy down exactly the same recursion tree argument we used for

merge short. It would apply equally well here. Alternatively, very soon we will

cover the master method, and as one very special case it will prove that this

algorithm, if we can implement it thusly, will run in O. Of N. Log in time. Now one

thing to realize, is this is a fairly ambitious goal, to count up the number of

split inversions in linear time. It's not that there can't be too many split

inversions. There can actually be a lot of them. If you have an array where the first

half of the array contains the numbers N over two plus one, up to N. Whereas the

second part of the array contains the numbers one up to N over two, that has a

quadratic number of inversions, all of which are split. So, what we're attempting

to do here is count up a quadratic number of things using only linear time. Can it

really be done? Yes is can, as we'll see in the next video.