In the last few videos we talked about how to do forward propagation and

back propagation in a neural network in order to compute derivatives.

But back prop as an algorithm has a lot of details and

can be a little bit tricky to implement.

And one unfortunate property is that there are many ways to

have subtle bugs in back prop.

So that if you run it with gradient descent or

some other optimizational algorithm, it could actually look like it's working.

And your cost function,

J of theta may end up decreasing on every iteration of gradient descent.

But this could prove true even though there might be some

bug in your implementation of back prop.

So that it looks J of theta is decreasing, but

you might just wind up with a neural network that has a higher level of

error than you would with a bug free implementation.

And you might just not know that there was this subtle bug that was giving you

worse performance.

So, what can we do about this?

There's an idea called gradient checking

that eliminates almost all of these problems.

So, today every time I implement back propagation or

a similar gradient to a [INAUDIBLE] on a neural network or

any other reasonably complex model, I always implement gradient checking.

And if you do this, it will help you make sure and sort of gain high confidence that

your implementation of four prop and back prop or whatever is 100% correct.

And from what I've seen this pretty much eliminates all the problems associated

with a sort of a buggy implementation as a back prop.

And in the previous videos I asked you to take on faith that the formulas I gave for

computing the deltas and the vs and so on, I asked you to take on

faith that those actually do compute the gradients of the cost function.

But once you implement numerical gradient checking, which is the topic of this

video, you'll be able to absolute verify for yourself that the code you're writing

does indeed, is indeed computing the derivative of the cross function J.