0:00

in the last video you saw what a single

Â hidden layer neural network looks like

Â in this video let's go through the

Â details of exactly how this neural

Â network computers outputs what you see

Â is that is like logistic regression the

Â repeater of all the times let's take a

Â look so this is what's a two layer

Â neural network

Â let's go more deeply into exactly what

Â this neural network

Â compute now was said before that

Â logistic regression the circle images

Â the regression really represents two

Â steps of computation first you compute Z

Â as follows and in second you compute the

Â activation as a sigmoid function of Z so

Â a neural network just does this a lot

Â more times let's start by focusing on

Â just one of the nodes in the hidden

Â there and this look at the first node in

Â the hidden layer so I've grayed out the

Â other nodes for now so similar to

Â logistic regression on the left this

Â node in a hidden layer does two steps of

Â computation right the first step and

Â think it's as the left half of this node

Â it computes Z equals W transpose X plus

Â B and the notation we'll use is um these

Â are all quantities associated with the

Â first hidden layer so that's why we have

Â a bunch of square brackets there and

Â this is the first node in the hidden

Â layer so that's why we have the

Â subscript one over there so first it

Â does that and then a second step is it

Â computes a 1 1 equals sigmoid of z11

Â like so so for both Z and ay the

Â notational convention is that a Li the L

Â here in superscript square brackets

Â refers to layer number and the I

Â subscript here refers to the nodes in

Â that layer so then they will be looking

Â at is layer 1 that is a hidden layer

Â node 1 so that's why the superscript and

Â subscript were on both 1 1 so that

Â little circle that first node in your

Â network represents carrying out these

Â two steps of computation now let's look

Â at the second node in your network the

Â second node in the hidden layer of in

Â your network similar to the logistic

Â regression unit on the left this little

Â circle represents two steps of

Â computation the first step is a

Â confusing Z this is still layer 1

Â pronounced the second node equals W

Â transpose X plus V

Â - and then a 1/2 equals Sigma z12 and

Â again feel free to pause the video if

Â you want but you can double check that

Â the superscript and subscript notation

Â is consistent with what we have written

Â here above in purple so we'll talk

Â through the first two hidden units in

Â the neural network on hidden units three

Â and four also represents some

Â computations so now let me take this

Â pair of equations and this pair of

Â equations and let's copy them to the

Â moon fly so here's our network and

Â here's the first and there's a second

Â equations they were worked on previously

Â for the first and the second hidden

Â units if you then go through and write

Â out the corresponding equations for the

Â third and fourth hidden units you get

Â the following and those make sure this

Â notation is clear this is the vector W 1

Â 1 this is a vector transpose x I think

Â so that's what the superscript G there

Â represents this is a vector transpose

Â now as you might have guessed if you're

Â actually implementing in your network

Â doing this with a for loop seems really

Â inefficient so what we're going to do is

Â take these four equations and vectorize

Â so I'm going to start by showing how to

Â compute Z as a vector it turns out you

Â could do it as follows

Â let me take these WS and stack them into

Â a matrix then you have W 1 1 transpose

Â so that's a row vector of the column

Â vector transpose gives you a row vector

Â then W 1 2 transpose W 1 3 transpose of

Â V 1 4 transpose and so this by stacking

Â goes from for W vectors together you end

Â up with a matrix so another way to think

Â of this is that we have for logistic

Â regression unions there and each of the

Â logistic regression unions have a

Â corresponding parameter vector W and by

Â stacking those four vectors together you

Â end up with this four by three matrix so

Â if you then take this matrix and

Â multiply it by your input features x1 x2

Â x3 you end up with by our matrix

Â multiplication works you end up with w1

Â 1 transpose x w1 w2 1 transpose X of U 3

Â 1 transyl

Â XW 1 transpose X and then let's not

Â forget the bees so we now add to this a

Â vector

Â b11 b12 b13 in 1/4 so that they see this

Â then this is b11 b12 b13 e 1/4 and so

Â you see that each of the four rows of

Â this outcome correspond exactly to each

Â of these four rows of each these four

Â quantities that we had above so in other

Â words we've just shown that this thing

Â is therefore equal to V 1 1 V 1 to V 1 V

Â V 1 4 right as defined here and maybe

Â not surprisingly we're going to call

Â this whole thing the vector V 1 which is

Â taken by stacking up these um

Â individuals of these into a column

Â vector when we're vectorizing one of the

Â rules of thumb that might help you

Â navigate this is that when we have

Â different nodes in a layer or stack them

Â vertically so that's why when you have Z

Â 1 1 2 Z 1 for those correspond to four

Â different nodes in the hidden layer and

Â so we stack these four numbers

Â vertically to form the vectors V 1 and

Â reduce one more piece of notation this 4

Â by 3 matrix here which we obtained by

Â stacking the lower case you know W 1 1 W

Â 1 2 and so on we're going to call this

Â matrix W capital 1 and similarly this

Â vector or going to call B superscript 1

Â square bracket and so this is a 4 by 1

Â vector so now we've computed Z using

Â this vector matrix notation the last

Â thing we need to do is also compute

Â these values of a and so probably won't

Â surprise you to see that we're going to

Â define a 1 as just stacking together

Â those activation values a11 to a14 so

Â just take these 4 values and stack them

Â together in a vector called a1 and this

Â is going to be sigmoid of z1 where

Â there's no husband implementation of the

Â sigmoid function that takes in the four

Â elements of Z and applies the sigmoid

Â function element wise to it so just a

Â recap we figured out that z1 is equal to

Â W 1 times the vector X plus the vector B

Â 1 and a 1

Â is sigmoid x z 1 let's just copy this to

Â the next slide and what we see is that

Â for the first layer of the neural

Â network given an input X we have that z1

Â is equal to w1 times X plus B 1 and a 1

Â is sick point of z1 and the dimensions

Â of this are 4 by 1 equals this is a 4 by

Â 3 matrix times a 3 by 1 vector plus a 4

Â by 1 vector B and this is 4 by 1 same

Â dimensions and remember that we said X

Â is equal to a 0 right just like Y hat is

Â also equal to a 2 so if you want you can

Â actually take this X and replace it with

Â a 0 since a 0 is if you want as an alias

Â for the vector of input futures X now

Â through a similar derivation you can

Â figure out that the representation for

Â the next layer can also be written

Â similarly where what the output layer

Â does is it has associated with it so the

Â parameters W 2 and B 2 so W 2 in this

Â case is going to be a 1 by 4 matrix and

Â B 2 is just a real number as 1 by 1 and

Â so V 2 is going to be a real numbers

Â right as a 1 by 1 matrix is going to be

Â a 1 by 4 thing times a was 4 by 1 plus B

Â 2 is 1 by 1 and so this gives you just a

Â real number and if you think of this

Â loss output unit as just being analogous

Â to logistic regression which had

Â parameters W and B on W really plays in

Â nablus role to W 2 transpose or W 2's

Â really W transpose and B is equal to B 2

Â right similar to you know cover up the

Â left of this network and ignore all that

Â for now then this is just this last

Â output unit there's a lot like logistic

Â regression except that instead of

Â writing the parameters as WMV we're

Â writing them as W 2 and B 2 with

Â dimensions one by four and one by one so

Â just a recap for logistic regression to

Â implement the output or the influence

Â prediction you compute Z equals W

Â transpose X plus B and a y hat equals a

Â equals sigmoid of z

Â when you have a new network who have one

Â fit in there what you need to implement

Â two computers output is just the four

Â equation and you can think of this as a

Â vectorized implementation of computing

Â the output of first these four

Â logistical russian units and hitting

Â there that's what this does and then

Â this which is regression in the output

Â layer which is what this does

Â I hope this description made sense but

Â takeaway is to compute the output of

Â this neural network all you need is

Â those four lines of code so now you've

Â seen how given a single input feature

Â vector at you can with four lines of

Â code compute the outputs of this viewer

Â Network um similar to what we did for

Â logistic regression will also want to

Â vectorize across multiple training

Â examples and we'll see that by stacking

Â up training examples in different colors

Â in the matrix or just slight

Â modification to this you also similar to

Â what you saw in which is regression be

Â able to compute the output of this

Â neural network not just on one example

Â at a time belong your say your

Â anti-trade set at a time so let's see

Â the details of that in the next video

Â