0:00

In the last video, you saw how to compute the prediction on a neural network,

Â given a single training example.

Â In this video, you see how to vectorize across multiple training examples.

Â And the outcome will be quite similar to what you saw for logistic regression.

Â Whereby stacking up different training examples in different columns of

Â the matrix, you'd be able to take the equations you had from the previous video.

Â And with very little modification, change them to make the neural network compute

Â the outputs on all the examples on pretty much all at the same time.

Â So let's see the details on how to do that.

Â These were the four equations we have from the previous video of how you compute z1,

Â a1, z2 and a2.

Â And they tell you how, given an input feature back to x,

Â you can use them to generate a2 =y hat for a single training example.

Â 0:54

Now if you have m training examples, you need to repeat this process for

Â say, the first training example.

Â x superscript (1) to compute

Â y hat 1 does a prediction on your first training example.

Â Then x(2) use that to generate prediction y hat (2).

Â And so on down to x(m) to generate a prediction y hat (m).

Â And so in all these activation function notation as well,

Â I'm going to write this as a[2](1).

Â And this is a[2](2),

Â and a(2)(m), so

Â this notation a[2](i).

Â The round bracket i refers to training example i,

Â and the square bracket 2 refers to layer 2, okay.

Â 2:04

And so to suggest that if you have an unvectorized implementation and

Â want to compute the predictions of all your training examples,

Â you need to do for i = 1 to m.

Â Then basically implement these four equations, right?

Â You need to make a z[1](i)

Â = W(1) x(i) + b[1],

Â a[1](i) = sigma of z[1](1).

Â z[2](i) = w[2]a[1](i)

Â + b[2] andZ2i equals w2a1i plus b2 and

Â a[2](i) = sigma point of z[2](i).

Â So it's basically these four equations on top by adding the superscript round

Â bracket i to all the variables that depend on the training example.

Â So adding this superscript round bracket i to x is z and a,

Â if you want to compute all the outputs on your m training examples examples.

Â What we like to do is vectorize this whole computation, so as to get rid of this for.

Â And by the way, in case it seems like I'm getting a lot of nitty gritty

Â linear algebra, it turns out that being able to implement this

Â correctly is important in the deep learning era.

Â And we actually chose notation very carefully for this course and

Â make this vectorization steps as easy as possible.

Â So I hope that going through this nitty gritty will actually help you to

Â more quickly get correct implementations of these algorithms working.

Â 3:59

So here's what we have from the previous slide with the for

Â loop going over our m training examples.

Â So recall that we defined the matrix x to be equal

Â to our training examples stacked up in these columns like so.

Â So take the training examples and stack them in columns.

Â So this becomes a n, or

Â maybe nx by m diminish the matrix.

Â 4:29

I'm just going to give away the punch line and tell you what you need to implement in

Â order to have a vectorized implementation of this for loop.

Â It turns out what you need to do is compute

Â Z[1] = W[1] X + b[1],

Â A[1]= sig point of z[1].

Â Then Z[2] = w[2]

Â A[1] + b[2] and

Â then A[2] = sig point of Z[2].

Â So if you want the analogy is that we went from lower case vector xs

Â to just capital case X matrix by stacking up the lower case xs in different columns.

Â If you do the same thing for the zs, so for example,

Â if you take z[1](i), z[1](2), and so

Â on, and these are all column vectors, up to z[1](m), right.

Â So that's this first quantity that all m of them, and stack them in columns.

Â Then just gives you the matrix z[1].

Â And similarly you look at say this quantity and

Â take a[1](1), a[1](2) and so on and

Â a[1](m), and stacked them up in columns.

Â Then this, just as we went from lower case x to capital case X, and

Â lower case z to capital case Z.

Â This goes from the lower case a, which are vectors to this capital A[1],

Â that's over there and similarly, for z[2] and a[2].

Â Right they're also obtained by taking these vectors and

Â stacking them horizontally.

Â And taking these vectors and stacking them horizontally,

Â in order to get Z[2], and E[2].

Â One of the property of this notation that might help

Â you to think about it is that this matrixes say Z and A,

Â horizontally we're going to index across training examples.

Â So that's why the horizontal index corresponds to different training example,

Â when you sweep from left to right you're scanning through the training cells.

Â And vertically this vertical index corresponds to different nodes in

Â the neural network.

Â So for example, this node, this value at the top most,

Â top left most corner of the mean corresponds to the activation

Â of the first heading unit on the first training example.

Â One value down corresponds to the activation in the second hidden unit on

Â the first training example,

Â then the third heading unit on the first training sample and so on.

Â So as you scan down this is your indexing to the hidden units number.

Â 7:39

Whereas if you move horizontally, then you're going from the first hidden unit.

Â And the first training example to now the first hidden unit and

Â the second training sample, the third training example.

Â And so on until this node here corresponds to the activation of the first

Â hidden unit on the final train example and the nth training example.

Â 8:42

So of these equations, you now know how to implement in your network

Â with vectorization, that is vectorization across multiple examples.

Â In the next video I want to show you a bit more justification about why

Â this is a correct implementation of this type of vectorization.

Â It turns out the justification would be similar to what you had seen [INAUDIBLE].

Â Let's go on to the next video.

Â