0:00

[MUSIC]

The next problem is that we're going to attack using of any in the program

of technique is so called chain matrix multiplication problem.

The input in this problem consist of M matrices

which we denote here by A0, A1 and so A n-1.

And what we need to find is the minimum cost of multiplying these n matrices.

We will now clarify what we mean by saying the cost.

Okay, first of all, recall that in order to be able to multiply

two matrices A and B, we need to ensure as at the number of columns

of the matrix A is equal to the number of rows of the matrix B.

This basically means that the number of columns of the matrix Ai

should be the same as the number of rows of the matrix Ai + 1, right?

So this allows us to denote the sizes of our n

matices by m0 times m1, m1 times m2 and so on.

So basically is the size of the matrix Ai is

equal to mi times mi + 1, okay?

Then recall that even for square matrices,

matrix multiplication in general is not commutative.

Namely, there are matrices A and B such that A times B is not equal to B times A.

This means actually that even if our matrices were square,

we would not be able to actually swap the order of matrices.

The same holds for four matrices.

So if we need to multiply A by B by C by D,

then there are at least two possible orders.

We may first want to multiple A by B then C by D then multiply two results,

or we may first multiply B by C then multiply A by the result.

And finally multiply the result by D.

Okay, so the cost of multiply two matrices of dimensions p by q and

q by r, we define to be equal to pqr.

This actually reflects the following fact.

So the resulting matrix has dimensions p by r, right?

So it has p by r cells and to compute the content of each cell, we need to multiply

a row of the first matrix of size q by a column of the second matrix of size q.

So it definitely can be done in time big O of q.

So definitely we can compute the product in time big O of pqr.

In fact faster algorithms are known for computing the product of two matrices,

but for the time being let's just assume that the cost of multiplying two matrices

p by q and q by r is just equal to pqr even without any constant.

And before designing an algorithm, let me illustrate you that even for

small data sets, for example for four matrices,

the order of multiplication actually matters.

So consider four matrices, A, B, C, and D of the following dimension.

So 50 by 20, 20 by 1, 1 by 10 and 10 by 100 and in the first

case I assumes that we going to multiply their products as follows.

So we first multiply B by C then multiply by D and

finally we multiply A by the result.

So after the first multiplication namely B by C,

we get a matrix B by C of dimensions 2 by 10, and

we pay for it 20 by 1 by 10, right?

Our next operation is multiplying this result by D.

Then what we get is a matrix of dimensions 20 by 100.

And we pay for it 20 by 10 by 100, okay?

Then finally, we multiply A by the resulting matrix.

And we pay for it 50 by 20 by 100.

As a result, the total cost of this multiplication is roughly 100,000, okay.

Now let me show you that in some other order multiplication of the same

two matrices, the cost is much lower.

Okay so assume that we first multiply A by B and then multiply it with C by D, okay?

So for the first multiplication, A by B, we pay actually 50 by 20 by 1.

And you see that we have this very thin matrix.

Then we multiply C by D, and we pay for this 1 times 10 times 100.

And again we see that here we get a matrix whose height is very low.

And finally we multiply these two resulting matrices,

and we pay 50 times 1 times 100, so

the result in cost in this case is just 7,000,

which is much lower than in the previous case.

5:26

Right, now an essential observation that will help us to

define a sub problem in these case and

to finally design a dynamic programming solution is the following.

Any order of multiplication of our matrices can be

conveniently represented as a binary tree,

where we have our matrices, our initial matrices.

And each internal node actually multiples two matrices.

So for example the first tree shown here on the slides,

their response to the order where we first multiply A by B and

multiply the result by C and multiply the result by D.

The second one corresponds to the order where we first multiply B by C and

then multiply the result by D.

And then finally we'll multiply A by the result and so on.

So in general, if we consider such a tree, not for four matrices but

for n matrices, and for example, let's consider just its four subtrees.

Okay, then what we see is that the left subtree here

multiply matrices starting from 0 to, say, i-1.

The next one will multiply matrices starting,

from A i, I'm sorry, to A j-1.

The next one will go from Aj to Ak- 1 for some k, and finally the last one,

the right-most one will multiply matrices starting from k to n- 1, right?

So the pattern that emerges here is that any subtree

in such a binary tree multiplies matrices from i

to j- 1 for some parameters i and j, okay.

And this suggests that we define a subproblem as follows.

So let M (i,j) be the minimum cost of

multiplying matrices from i to j- 1, okay?

So we know that it can be represented as a tree.

And in turn if we can see that there's trees and where it has a root and

two subtrees, right?

And we know that in the subtrees,

what is computed is the product from i to k -1 in the left subtree for

some k, and in the right subtree from k to j- 1.

So what we basically need to do to compute the value of M(i,

j) is to go through all potential values of k, and

this is what we do actually to define the reference relation.

8:15

So finally M(i,j) is actually equal to the minimum over all

k which is greater than i and smaller than j, of the following quantity.

M(i,k), this is the cost of multiplying matrices from i to k- 1 + M(k,j),

this is the cost of multiplying matrices from k to j- 1,

and plus the cost of multiplying two results.

And the results actually have dimensions mi times mk, and mk times mj.

So the cost of multiplying these two matrices is mi times mk times mj.

Okay, there is also base case where j = i + 1,

in this case we have a single matrix, so

there is nothing to multiply, so the cost in this case is equal to 0.

Okay, so when we have a reference relation,

as usually it's just a technicality to convert it into a reference,

into a recursive algorithm, I'm sorry.

So you see this algorithm here on this slide.

So again, we have the table T which is just the dictionary.

Again, what we do is we just, what we need to compute

is M(i,j) with at least two parameters to our procedure and

we only start computing it if it is not in our table yet.

So if j = i + 1, then we are in the base case.

Then we say that then the result = 0.

Otherwise, we compute it using our recurrence relation by making recursive

calls where k ranges from i + 1 to j- 1.

So for each such recursive call we check

whether the resulting value improves our current solution.

If it improves, we actually store it, okay?

So in order to convert this recursive algorithm into an iterative one

we need to actually make some little in this case.

Because in this case,

it is not immediately clear what is the right order of all sub problems.

Mainly what we need to do is to go from smaller size

subproblems to larger size subproblems.

Since in this case each subproblem is actually specified by two parameters i and

j, it is natural to use a two-dimensional table to store all the results.

At the same time we cannot fill in this table for

example row by row or column by column.

Because it will not give us a possibility to go from smaller size

subproblems to larger size subproblems.

And at this point it is important to define what is the size of the subproblem.

It is natural to say that the size of a subproblem is the number

of matrices that needs to be multiplied, right.

So the size of a subproblems is just j-1 which basically means

that we need to start with the size equal to 1, then proceed to

size equal to 2, then proceed to size equal to 3, and so on.

So we fill in our matrix diagonal by diagonal, like this, right?

And this gives us the following code.

So in this case we used parameter s, which actually denotes the size.

So when s is equal to 0 or equal to

1 this is just our base case, so we initialize all such cells with 0.

Then we readily increase s, starting from 2 to n.

And for each such s, we consider all the subproblems whose size is s.

And we do this just by iterating over all i, and

j is computed as i + s in this case.

And then we compute the value using our previous reference relation.

So this gives us an algorithm whose running time is cubic, it is in cubes.

This is basically just because we have three, four nested loops.

To put it otherwise, we have n squared subproblems and for

each subproblem we need a linear amount of work to compute the result.

So finally to unwind a solution, we basically need

to go from the final cell, which is a cell within

13:24

So by unwinding a solution in this manner,

we will actually construct these tree, right?

Which represents the optimal order of multiplying our matrices.

Finally, to our right, to the right notion of the subproblem

from a different side, namely by optimizing a brute force solution,

one can implement a brute force solution as follows.

So it was a brute force in this case is to enumerate all possible binary trees.

And actually to go over possible binary trees and

to find that it's the best one.

And actually to optimize it, one just notice that when we find a subproblem,

when we compute the value of a subproblem,

we do not need to know the shape of this tree.

We only need to know what other matrices there are,

multiplied in this particular tree.