0:30

And when you're implementing machine learning algorithms, if you're able to

take advantage of these linear algebra libraries or these numerical linear

algebra libraries, and mix of routine codes to them rather then sort of write

code yourself to do things that these libraries could be doing.

And if you do that, then often you get code that first is more efficient.

So, just run more quickly and take better advantage of any parallel hardware your

computer may have and so on. And second it also means that you end up

with less code that you need to write. So, the simpler implementation that is

therefore, maybe also more likely to be concrete.

And as a concrete example rather than writing code yourself to

multiply mactrices, you can let octave do it by typing A times B that will use a

very efficient routine to multiply the two matrices.

And there's a bunch of examples like these where, if you use appropriate

vectorize implementations, you get much simpler code and much efficient code.

Let's look at some examples. Here's our usual hypothesis for linear

regression. And if you want to compute h of x, notice

that there's a sum on the right. And so, one thing you could do is,

compute the sum from j=00 and j=n yourself.

Another way to think of this is to think of h of x as theta transpose x.

And what you can do is think of this as you're computing this inner product

between two vectors where theta is, you know, your vector, let's say, theta zero,

theta one, theta two. If you have two features, if n=2, and if

you think of x as this vector, x0, x1, x2.

And, these two views can give you two different implementations.

Here's what I mean. Here's an unvectorized implementation for

how to compute h of x and by unvectorized I mean without vectorization.

We might first initialize your prediction to, to be 0.0.

This is going to eventually be, our prediction's going to eventually be h of

x. And then, I'm going to have a for loop

for j=11 through n1. + 1, prediction gets incremented by theta

j * xj. So, it's kind of, this expression over

here. By the way, I should mention and these

vectors I wrote over here, I had these vectors being zero index.

So it has theta zero, theta one, theta two.

But because Matlab is one index, theta zero in Matlab, we would end up

representing as eheta one. And this second element ends up as theta

two, and this third element might end up as theta three just because of vectors in

Matlab are indexed starting from one even though, you know, I wrote theta and x

here starting indexing from zero, which is why here I have a for loop.

j goes from one through n1 + 1 rather than j goes through zero up to n,

right? The so, this is an unvectorized

implementation in that we have a for loop, that's, you know, summing up the n

elements of the sum. In contrast, here's how you would write a

vectorized implementation which is that, you would think of

x and theta as vectors. And you just set prediction equals theta

transfer times x, so just computing like so.

So you, instead of writing, you know, all these lines of code with a for loop, you

instead just have one line of code and what this, what this line of code on the

right will do is it will use octaves highly optimized numerical of linear

algebra routines to compute this inner product between the two vectors, theta

and x. And not only is the vector implementation

simpler, it will also run much more efficiently.

4:15

So that was octave. But, the issue of vectorization applies to other

programming languages as well. Let's look at an example in C++.+.

Here's what an unvectorized implementation might look like.

We again initialize, you know, a prediction to 0.0, and then

we now have a for loop for j=00. up to n prediction plus or equals theta j

* xj where again, you have this explicit for loop that you write yourself.

In contrast, using a good numerical linear algebra library in C++,+.

you could use, write a function like, or rather in contrast, using a good

numerical linear algebra library in C++,+, you can instead, write code that

might look like this. So, depending on the details of your,

numerical linear algebra library, you might whether have an object, this is a

C+++ object which is vector theta and C+++ object which is a vector x.

And you just take theta.transposex. times x where this times becomes a C+++

to overload an operator so that can just multiply, these two vectors in C++.+.

And depending on, you know, the details of your numerical linear algebra library,

you might end up using a slightly different syntax.

But by relying on the library to do this in the product, you can get a much

simpler piece of code and a much more efficient one.

Let's now look at a more sophisticated example.

Just to remind you, here's our update rule for gradient descent for linear

regression. And so we update theta j using this rule for all values of j = 0,

1, 2 and so on. And if we just write out these equations for theta zero, theta

one, theta two, assuming we have two features of n = 2, then these are the

updates we perform to theta zero, theta one, theta two, where you might remember

my saying in an earlier video, that these should be simultaneous updates.

So, let's see if we can come up with a vectorized implementation of this.

Here are my same three equations written in a slightly smaller font.

And you can imagine that one way to implement these three lines of code is to

have a for loop that says, you know, for j = 0, 1 through 2,

to update theta j, or something like that.

But instead, let's cover with a vectorized implementation and see if we

can have a simpler way to basically compress these three lines of code or a

for loop that. you know, effectively does these three

sets one set at a time. Let's do take these three sets and compress them into

one line of vectorized code. Here's the idea.

What I'm going to do is I'm going to think of theta as a vector,

and I'm going to update theta as theta. Mine is alpha times some other vector

delta. Where delta's going to be equal to one

over m, sum from i equals one through m and then this term over on the right,

okay? So, let me explain what's going on here.

Here, I'm going to treat theta as a vector,

so there's an n plus one dimensional vector.

And I'm saying that theta gets your updated as a vector or n plus one.

Alpha is a real number, and delta here as a vector. So, this subtraction operation,

that's a vector subtraction, okay? because alpha times delta is a vector,

and so I'm saying theta gets, you know, this vector alpha times delta subtracted

from it. So, what is the vector delta? Well, this vector delta looks like this.

And what it is meant to be is really meant to be, this thing over here.

Concretely, delta will be a n plus one dimensional vector, and the very first

element of the vector delta is going to be equal to that.

So if we have the delta, if we index it from zero, there's delta zero, delta one,

delta two, what I want is that delta zero is equal to, you know, this first box in

green up above. And, indeed you might be able to convince

yourself that delta zero is this one over m, sum of, you know, h of x xi minus yi

times xi, zero. So, lets just make sure that we're on the

same page about how delta really is computed.

Delta is one over m times the sum, over here, and, you know, what, what, what is

the sum? Well, this term over here,

9:13

that's a real number. And the second term over here, xi,

this term over there is a vector, right? because xi, may be a vector that would be

say xi0, xi1, xi2, right?

And, what is the summation? Well, what the summation is saying is

that this term that is this term over here,

this is equal to h of x1 - y1 * x1 + h of x2 - y2 * x2 plus,

you know, and so on. Okay?

Because this is a summation over i, so as i ranges from i = 1 through m,

you get these different terms, and you're summing up these terms here.

And the meaning of each of these terms, you know,

this is a lot like if you remember, actually from the, from

the earlier quiz in this, right? You, you saw this equation.

we said that, in order to vectorize this code,

we would, instead, set u = 2v + 5w. So, we're saying that the vector u is

equal to two times the vector v plus five times the vector w.

So, this is an example of how to add different vectors.

And this summation is the same thing. This is saying that the summation over

here is just some row number, right? That's kind of like the number two, or

some other number times the vector x1. This is kind of like, you know,

two times v, and saying with some other number times x1.

And then, plus, you know, instead of five times w, we instead have

some other row number plus some other vector.

And then, you add on other vectors, you know,

plus dot, dot, dot, dot, dot plus the other vectors,

which is y overall. This thing over here, that whole

quantity, that delta is just some vector. And concretely, the three elements of

delta correspond if n = 2, the three elements of delta correspond exactly to

this thing, to the second thing, and this third thing.

Which is why when you update theta, according to theta minus alpha delta we

end up carrying exactly the same simultaneous updates as, as the update

rules that we have on top. So, I know that there was a lot that

happened on this slide, but again, feel free to pause the video and, and I'd

encourage you to, so the step through the difference, if, if you aren't sure what

just happened, I'd encourage you to step through the slide to make sure you

understand why is it that this update here, with this definition fo delta,

right? Why is it that that's equal to this

update on top. and if it's still not clear one insight

is that, you know, this thing over here? That's exactly the vector x.

And so, we're just taking, you know, all three of these computations and

compressing them into one step which is vector delta which is why we can come up

with a vectorized implementation of this of this step of linear regression this

way. So,

I hope this step makes sense and do, do look at the video and make sure, and see

if you can understand it. in case you don't understand quite the

equivalents of this math, if you implement this, this turns out to be the

right answer anyway. So even, even if you didn't quite

understand the equivalents if you just implement it this way, you, you, you'll

be able to get linear regression to work. But, if you're able to figure out why

these two steps are equivalent, then hopefully, that will give you a better

understanding of vectorization as well. And finally,

if you implementing linear regression using more than one or two features.

So, sometimes we use linear regression with tens or hundreds or thousands of

features. But if you use the vectorized

implementation of linear regression, your data will run much faster than if you had

say, your old for loop that was, you know, updating theta zero then theta one,

then theta two yourself. So, using a vectorizing implementation,

you should be able to get a much more efficient implementation of linear

regression. And when you vectorize later algorithms

that we'll see in this course is a good trick whether in octave or some other

language, like C++ or Java, for getting your code

to run more efficiently.