0:00

in a previous video you saw the basic

blocks of implementing a deep neural

network a for propagation step for each

layer and a corresponding backward

propagation step let's see how you can

actually implement these steps will

start - for propagation recall that what

this will do is input al - 1 and output

al and the cache ZL and we just said

that from implementational point of view

maybe we'll cache WL + BL as well just

to make the assumptions call the easier

in the preliminaries eyes and so the

equations for this should already look

familiar the way to improve the forward

function is just this equals WL x a l

minus 1 plus b l and then al equals the

activation function applied to z and if

you want a vector rise implementation

then it's just that x a l minus 1 plus b

would be adding beeping pipes and

broadcasting and al equals G applied

element wise to Z and remember on the

diagram for the forth step where we have

this chain of boxes going forward so you

initialize that with feeding and a 0

which is equal to X so you initialize

this really what is the input to the

first one right is really on a 0 which

is the input features to either for one

training example if you're doing one

example at a time or um a practical 0

the entire training so if you are

processing the entire training site so

that's the input to the first forward

function in the chain and then just

repeating this allows you to compute

forward propagation from left to right

next let's talk about the backward

propagation step here you go is the

input D al and output D al minus 1 and d

WL & DB let me just write out the steps

you need to compute these things

pz l is equal to da l alamin Weis

product with G of L prime Z of L and

then compute the derivatives DW l equals

d ZL times a of L minus 1

I didn't explicitly put that in the

cache where it turns out you need this

as well and then DB l is equal to DZ l

and finally da of L minus 1 there's

equal to WL transpose times d ZL ok and

I don't want to go through the detailed

derivation for this but it turns out

that if you take this definition to DA

and plug it in here then you get the

same formula as we had in there

previously for how you compute d ZL as a

function of the previous easy L in fact

well if I just plug that in here you end

up that d ZL is equal to WL plus 1

transpose DZ l plus 1 times G L prime Z

FL I know this is a looks like a lot of

algebra and you can actually double

check for yourself that this is the

equation where I've written down for

back propagation last week when we were

doing in your network with just a single

hidden layer and as you reminder this

times this element-wise product but so

all you need is those four equations to

implement your backward function and

then finally I'll just write out the

vectorized version so the first line

becomes DZ l equals e a l element-wise

product with GL prime of z l may be no

surprise there DW l becomes 1 over m DZ

l times a L minus 1 transpose and

in dbl becomes one over m MP dot some

easy L then accrues equals one

keep dims equals true we talked about

the use of MP dot some in the previous

week to compute DB and then finally da L

minus one is WL transpose times T Z L so

this allows you to input this quantity

da over here and output on DW l DP l the

derivatives you need as well as da L

minus 1 right as follows so that's how

you implement the backward function so

just to summarize um take the input X

you might have the first layer maybe has

a regular activation function then go to

the second layer maybe uses another

value activation function goes to the

third layer maybe has a sigmoid

activation function if you're doing

binary classification and this outputs

one hat and then using Y hat you can

compute the loss and this allows you to

start your backward integration I draw

the arrows first I guess I don't have to

change pens too much where you were then

have back prop compute the derivatives

compute your d w3 t p3 d w2 t p2 d w1 t

b1 and along the way you would be

computing I guess the cash will transfer

z1 z2 z3 and here are you pass backward

da 2 and da

this could compute da zero but we won't

use that so you can just discard that

and so this is how you implement forward

prop and back prop for a three-layer

your network now there's just one last

detail delight didn't talk about which

is for the forward recursion we would

initialize it with the input data X how

about the backward recursion well it

turns out that um D a of L when you're

using logistic regression when you're

doing binary qualification is equal to Y

over a plus 1 minus y over 1 minus a so

it turns out that the derivative of the

loss function respect to the output

we're expected Y hat can be shown to be

equal to this so if you're familiar with

calculus if you look up the loss

function L and take derivatives respect

to Y I have a respect to ay you can show

that you get that formula so this is the

formula you should use for da for the

final layer capital L and of course if

you were to have a vectorized

implementation then you initialize the

backward recursion not with this there

will be a capital A for the layer L

which is going to be you know the same

thing for the different examples right

over a for the first training example

plus 1 minus y for the first training

example over 1 minus 8 for the first

training example not down to the M

training example 1 minus a of M so

that's how you taught implement the

vectorized version that's how you

initialize the vectorized version of

background brocation so you've now seen

the basic building blocks of both for

propagation as well as back propagation

now if you implement these equations you

will get a correct implementation of

board prop and back prop to get to the

derivatives unique you might be thinking

well there's a lot equations I'm

slightly confused I'm not quite sure I

see how this works and if you're feeling

that way my advice is when you get to

this week's programming assignment you

will be able to implement these for

yourself and there'll be much more

concrete and I know there's a lot of

equations and maybe some equations in me

complete sense but if you work through

the calculus and the linear algebra

which is not easy so you know feel free

to try but that's actually have one is

more difficult derivations in machine

learning it turns out the equations

wrote down that just the calculus

equations for computing the derivatives

especially in backdrop but once again if

this C is well bit ass check will be

mysterious to you my advice is when

you've done there provide exercise it

will feel a bit more concrete to you

although I have to say you know even

today when I implement a learning

algorithm sometimes even I'm surprised

when my learning algorithm

implementation works and it's because

longer complexity of machine learning

comes from the data rather than from the

lines of code so sometimes you feel like

you implement a few lines of code not

question what it did but there's almost

magically work and it's because of all

the magic is actually not in the piece

of code you write which is often you

know not too long it's not it's not

exactly simple but there's not you know

10,000 100,000 lines of code but you

feed it so much data that sometimes even

though I work the machine only for a

long time sometimes it's so you know

surprises me a bit when my learning

algorithm works because lots of

complexity of your learning algorithm

comes from the data rather than

necessarily from your writing you know

thousands and thousands of lines of code

all right so that's um how do you

implement deep neural networks and again

this will become more concrete when

you've done the priming exercise before

moving on I want to discuss in the next

video want to discuss hyper parameters

and parameters it turns out that when

you're training deep nets being able to

organize your hyper params as well will

help you be more efficient in developing

your networks in the next video let's

talk about exactly what that means