In this video, I'd like to tell you about the idea of vectorization. So, whether you're using octave or similar language like Matlab, or whether your using Python, Empire, or Java, C, C++, all of these languages have either built into them or have readily and easily accessible different numerical linear algebra libraries that are usually very well written, highly optimized often sort of developed by people that, you know, have PHDs in the numerical computing, or they're really specialized in numerical computing. And when you're implementing machine learning algorithms, if you're able to take advantage of these linear algebra libraries or these numerical linear algebra libraries, and mix of routine codes to them rather then sort of write code yourself to do things that these libraries could be doing. And if you do that, then often you get code that first is more efficient. So, just run more quickly and take better advantage of any parallel hardware your computer may have and so on. And second it also means that you end up with less code that you need to write. So, the simpler implementation that is therefore, maybe also more likely to be concrete. And as a concrete example rather than writing code yourself to multiply mactrices, you can let octave do it by typing A times B that will use a very efficient routine to multiply the two matrices. And there's a bunch of examples like these where, if you use appropriate vectorize implementations, you get much simpler code and much efficient code. Let's look at some examples. Here's our usual hypothesis for linear regression. And if you want to compute h of x, notice that there's a sum on the right. And so, one thing you could do is, compute the sum from j=00 and j=n yourself. Another way to think of this is to think of h of x as theta transpose x. And what you can do is think of this as you're computing this inner product between two vectors where theta is, you know, your vector, let's say, theta zero, theta one, theta two. If you have two features, if n=2, and if you think of x as this vector, x0, x1, x2. And, these two views can give you two different implementations. Here's what I mean. Here's an unvectorized implementation for how to compute h of x and by unvectorized I mean without vectorization. We might first initialize your prediction to, to be 0.0. This is going to eventually be, our prediction's going to eventually be h of x. And then, I'm going to have a for loop for j=11 through n1. + 1, prediction gets incremented by theta j * xj. So, it's kind of, this expression over here. By the way, I should mention and these vectors I wrote over here, I had these vectors being zero index. So it has theta zero, theta one, theta two. But because Matlab is one index, theta zero in Matlab, we would end up representing as eheta one. And this second element ends up as theta two, and this third element might end up as theta three just because of vectors in Matlab are indexed starting from one even though, you know, I wrote theta and x here starting indexing from zero, which is why here I have a for loop. j goes from one through n1 + 1 rather than j goes through zero up to n, right? The so, this is an unvectorized implementation in that we have a for loop, that's, you know, summing up the n elements of the sum. In contrast, here's how you would write a vectorized implementation which is that, you would think of x and theta as vectors. And you just set prediction equals theta transfer times x, so just computing like so. So you, instead of writing, you know, all these lines of code with a for loop, you instead just have one line of code and what this, what this line of code on the right will do is it will use octaves highly optimized numerical of linear algebra routines to compute this inner product between the two vectors, theta and x. And not only is the vector implementation simpler, it will also run much more efficiently. So that was octave. But, the issue of vectorization applies to other programming languages as well. Let's look at an example in C++.+. Here's what an unvectorized implementation might look like. We again initialize, you know, a prediction to 0.0, and then we now have a for loop for j=00. up to n prediction plus or equals theta j * xj where again, you have this explicit for loop that you write yourself. In contrast, using a good numerical linear algebra library in C++,+. you could use, write a function like, or rather in contrast, using a good numerical linear algebra library in C++,+, you can instead, write code that might look like this. So, depending on the details of your, numerical linear algebra library, you might whether have an object, this is a C+++ object which is vector theta and C+++ object which is a vector x. And you just take theta.transposex. times x where this times becomes a C+++ to overload an operator so that can just multiply, these two vectors in C++.+. And depending on, you know, the details of your numerical linear algebra library, you might end up using a slightly different syntax. But by relying on the library to do this in the product, you can get a much simpler piece of code and a much more efficient one. Let's now look at a more sophisticated example. Just to remind you, here's our update rule for gradient descent for linear regression. And so we update theta j using this rule for all values of j = 0, 1, 2 and so on. And if we just write out these equations for theta zero, theta one, theta two, assuming we have two features of n = 2, then these are the updates we perform to theta zero, theta one, theta two, where you might remember my saying in an earlier video, that these should be simultaneous updates. So, let's see if we can come up with a vectorized implementation of this. Here are my same three equations written in a slightly smaller font. And you can imagine that one way to implement these three lines of code is to have a for loop that says, you know, for j = 0, 1 through 2, to update theta j, or something like that. But instead, let's cover with a vectorized implementation and see if we can have a simpler way to basically compress these three lines of code or a for loop that. you know, effectively does these three sets one set at a time. Let's do take these three sets and compress them into one line of vectorized code. Here's the idea. What I'm going to do is I'm going to think of theta as a vector, and I'm going to update theta as theta. Mine is alpha times some other vector delta. Where delta's going to be equal to one over m, sum from i equals one through m and then this term over on the right, okay? So, let me explain what's going on here. Here, I'm going to treat theta as a vector, so there's an n plus one dimensional vector. And I'm saying that theta gets your updated as a vector or n plus one. Alpha is a real number, and delta here as a vector. So, this subtraction operation, that's a vector subtraction, okay? because alpha times delta is a vector, and so I'm saying theta gets, you know, this vector alpha times delta subtracted from it. So, what is the vector delta? Well, this vector delta looks like this. And what it is meant to be is really meant to be, this thing over here. Concretely, delta will be a n plus one dimensional vector, and the very first element of the vector delta is going to be equal to that. So if we have the delta, if we index it from zero, there's delta zero, delta one, delta two, what I want is that delta zero is equal to, you know, this first box in green up above. And, indeed you might be able to convince yourself that delta zero is this one over m, sum of, you know, h of x xi minus yi times xi, zero. So, lets just make sure that we're on the same page about how delta really is computed. Delta is one over m times the sum, over here, and, you know, what, what, what is the sum? Well, this term over here, that's a real number. And the second term over here, xi, this term over there is a vector, right? because xi, may be a vector that would be say xi0, xi1, xi2, right? And, what is the summation? Well, what the summation is saying is that this term that is this term over here, this is equal to h of x1 - y1 * x1 + h of x2 - y2 * x2 plus, you know, and so on. Okay? Because this is a summation over i, so as i ranges from i = 1 through m, you get these different terms, and you're summing up these terms here. And the meaning of each of these terms, you know, this is a lot like if you remember, actually from the, from the earlier quiz in this, right? You, you saw this equation. we said that, in order to vectorize this code, we would, instead, set u = 2v + 5w. So, we're saying that the vector u is equal to two times the vector v plus five times the vector w. So, this is an example of how to add different vectors. And this summation is the same thing. This is saying that the summation over here is just some row number, right? That's kind of like the number two, or some other number times the vector x1. This is kind of like, you know, two times v, and saying with some other number times x1. And then, plus, you know, instead of five times w, we instead have some other row number plus some other vector. And then, you add on other vectors, you know, plus dot, dot, dot, dot, dot plus the other vectors, which is y overall. This thing over here, that whole quantity, that delta is just some vector. And concretely, the three elements of delta correspond if n = 2, the three elements of delta correspond exactly to this thing, to the second thing, and this third thing. Which is why when you update theta, according to theta minus alpha delta we end up carrying exactly the same simultaneous updates as, as the update rules that we have on top. So, I know that there was a lot that happened on this slide, but again, feel free to pause the video and, and I'd encourage you to, so the step through the difference, if, if you aren't sure what just happened, I'd encourage you to step through the slide to make sure you understand why is it that this update here, with this definition fo delta, right? Why is it that that's equal to this update on top. and if it's still not clear one insight is that, you know, this thing over here? That's exactly the vector x. And so, we're just taking, you know, all three of these computations and compressing them into one step which is vector delta which is why we can come up with a vectorized implementation of this of this step of linear regression this way. So, I hope this step makes sense and do, do look at the video and make sure, and see if you can understand it. in case you don't understand quite the equivalents of this math, if you implement this, this turns out to be the right answer anyway. So even, even if you didn't quite understand the equivalents if you just implement it this way, you, you, you'll be able to get linear regression to work. But, if you're able to figure out why these two steps are equivalent, then hopefully, that will give you a better understanding of vectorization as well. And finally, if you implementing linear regression using more than one or two features. So, sometimes we use linear regression with tens or hundreds or thousands of features. But if you use the vectorized implementation of linear regression, your data will run much faster than if you had say, your old for loop that was, you know, updating theta zero then theta one, then theta two yourself. So, using a vectorizing implementation, you should be able to get a much more efficient implementation of linear regression. And when you vectorize later algorithms that we'll see in this course is a good trick whether in octave or some other language, like C++ or Java, for getting your code to run more efficiently.