0:00

In the previous video,

Â we saw how with your training examples stacked up horizontally in the matrix x,

Â you can derive a vectorized implementation for propagation through your neural network.

Â Let's give a bit more justification for why the equations we wrote

Â down is a correct implementation of vectorizing across multiple examples.

Â So let's go through part of the propagation calculation for the few examples.

Â Let's say that for the first training example,

Â you end up computing

Â this x1 plus b1 and then for the second training example,

Â you end up computing this x2 plus b1 and

Â then for the third training example,

Â you end up computing this 3 plus b1.

Â So, just to simplify the explanation on this slide, I'm going to ignore b.

Â So let's just say, to simplify this justification a little bit that b is equal to zero.

Â But the argument we're going to lay out will work with

Â just a little bit of a change even when b is non-zero.

Â It does just simplify the description on the slide a bit.

Â Now, w1 is going to be some matrix, right?

Â So I have some number of rows in this matrix.

Â So if you look at this calculation x1,

Â what you have is

Â that w1 times x1 gives you some column vector which you must draw like this.

Â And similarly, if you look at this vector x2,

Â you have that w1 times

Â x2 gives some other column vector, right?

Â And that's gives you this z12.

Â And finally, if you look at x3,

Â you have w1 times x3,

Â gives you some third column vector, that's this z13.

Â So now, if you consider the training set capital X,

Â which we form by stacking together all of our training examples.

Â So the matrix capital X is formed by taking the vector x1 and

Â stacking it vertically with x2 and then also x3.

Â This is if we have only three training examples.

Â If you have more, you know, they'll keep stacking horizontally like that.

Â But if you now take this matrix x and multiply it by w then you end up with,

Â if you think about how matrix multiplication works,

Â you end up with the first column being

Â these same values that I had drawn up there in purple.

Â The second column will be those same four values.

Â And the third column will be those orange values,

Â what they turn out to be.

Â But of course this is just equal to z11 expressed as

Â a column vector followed by z12 expressed as a column vector followed by z13,

Â also expressed as a column vector.

Â And this is if you have three training examples.

Â You get more examples then there'd be more columns.

Â And so, this is just our matrix capital Z1.

Â So I hope this gives a justification for why we had

Â previously w1 times xi equals

Â z1i when we're looking at single training example at the time.

Â When you took the different training examples and stacked them up in different columns,

Â then the corresponding result is that you end up

Â with the z's also stacked at the columns.

Â And I won't show but you can convince yourself if you want that with Python broadcasting,

Â if you add back in,

Â these values of b to the values are still correct.

Â And what actually ends up happening is you end up with Python broadcasting,

Â you end up having bi individually to each of the columns of this matrix.

Â So on this slide, I've only justified that z1 equals

Â w1x plus b1 is

Â a correct vectorization of

Â the first step of the four steps we have in the previous slide,

Â but it turns out that a similar analysis allows you to

Â show that the other steps also work on using

Â a very similar logic where if you stack the inputs in columns then after the equation,

Â you get the corresponding outputs also stacked up in columns.

Â Finally, let's just recap everything we talked about in this video.

Â If this is your neural network,

Â we said that this is what you need to do if you were to implement for propagation,

Â one training example at a time going from i equals 1 through m. And then we said,

Â let's stack up the training examples in columns like so and for each of these values z1,

Â a1, z2, a2, let's stack up the corresponding columns as follows.

Â So this is an example for a1 but this is true for z1,

Â a1, z2, and a2.

Â Then what we show on the previous slide was that

Â this line allows you to vectorize this across all m examples at the same time.

Â And it turns out with the similar reasoning,

Â you can show that all of the other lines are

Â correct vectorizations of all four of these lines of code.

Â And just as a reminder,

Â because x is also equal to a0 because remember that

Â the input feature vector x was equal to a0, so xi equals a0i.

Â Then there's actually a certain symmetry to

Â these equations where this first equation can also be

Â written z1 equals w1 a0 plus b1.

Â And so, you see that this pair of equations and this pair of

Â equations actually look very similar but just of all of the indices advance by one.

Â So this kind of shows that the different layers of a neural network are

Â roughly doing the same thing or just doing the same computation over and over.

Â And here we have two-layer neural network where we go to

Â a much deeper neural network in next week's videos.

Â You see that even deeper neural networks are basically taking

Â these two steps and just doing them even more times than you're seeing here.

Â So that's how you can vectorize your neural network across multiple training examples.

Â Next, we've so far been using the sigmoid functions throughout our neural networks.

Â It turns out that's actually not the best choice.

Â In the next video, let's dive a little bit

Â further into how you can use different, what's called,

Â activation functions of which the sigmoid function is just one possible choice.

Â