Here's the cost function that

we wrote down in the previous video.

What we'd like to do is

try to find parameters theta

to try to minimize j of theta.

In order to use either gradient descent

or one of the advance optimization algorithms.

What we need to do therefore is

to write code that takes

this input the parameters theta

and computes j of theta

and these partial derivative terms.

Remember, that the parameters

in the the neural network of these things,

theta superscript l subscript ij,

that's the real number

and so, these are the partial derivative terms

we need to compute.

In order to compute the

cost function j of theta,

we just use this formula up here

and so, what I want to do

for the most of this video is

focus on talking about

how we can compute these

partial derivative terms.

Let's start by talking about

the case of when we have only

one training example,

so imagine, if you will that our entire

training set comprises only one

training example which is a pair xy.

I'm not going to write x1y1

just write this.

Write a one training example

as xy and let's tap through

the sequence of calculations

we would do with this one training example.