And the difference between the right hand side and

the left hand side implementations is that If you look down here,

you look at this step, if by this time you've already updated theta 0,

then you would be using the new value of theta 0 to compute this derivative term.

And so this gives you a different value of temp1, than the left-hand side, right?

Because you've now plugged in the new value of theta 0 into this equation.

And so, this on the right-hand side is not a correct implementation

of gradient descent, okay?

So I don't wanna say why you need to do the simultaneous updates.

It turns out that the way gradient descent is usually implemented,

which I'll say more about later,

it actually turns out to be more natural to implement the simultaneous updates.

And when people talk about gradient descent,

they always mean simultaneous update.

If you implement the non simultaneous update,

it turns out it will probably work anyway.

But this algorithm wasn't right.

It's not what people refer to as gradient descent, and

this is some other algorithm with different properties.

And for various reasons this can behave in slightly stranger ways, and so

what you should do is really implement the simultaneous update of gradient descent.

So, that's the outline of the gradient descent algorithm.

In the next video, we're going to go into the details of the derivative term,

which I wrote up but didn't really define.

And if you've taken a calculus class before and if you're familiar with partial

derivatives and derivatives, it turns out that's exactly what that derivative term

is, but in case you aren't familiar with calculus, don't worry about it.

The next video will give you all the intuitions and

will tell you everything you need to know to compute that derivative term, even if

you haven't seen calculus, or even if you haven't seen partial derivatives before.

And with that, with the next video, hopefully we'll

be able to give you all the intuitions you need to apply gradient descent.