By the way, the number of iterations the gradient descent takes to converge for

a physical application can vary a lot, so maybe for

one application, gradient descent may converge after just thirty iterations.

For a different application, gradient descent may take 3,000 iterations,

for another learning algorithm, it may take 3 million iterations.

It turns out to be very difficult to tell in advance how many iterations gradient

descent needs to converge.

And is usually by plotting this sort of plot, plotting the cost function as we

increase in number in iterations, is usually by looking at these plots.

But I try to tell if gradient descent has converged.

It's also possible to come up with automatic convergence test,

namely to have a algorithm try to tell you if gradient descent has converged.

And here's maybe a pretty typical example of an automatic convergence test.

And such a test may declare convergence if your cost function J(theta)

decreases by less than some small value epsilon,

some small value 10 to the minus 3 in one iteration.

But I find that usually choosing what this threshold is is pretty difficult.

And so in order to check your gradient descent's converge

I actually tend to look at plots like these, like this figure on the left,

rather than rely on an automatic convergence test.

Looking at this sort of figure can also tell you, or give you an advance warning,

if maybe gradient descent is not working correctly.

Concretely, if you plot J(theta) as a function of the number of iterations.

Then if you see a figure like this where J(theta) is actually increasing,

then that gives you a clear sign that gradient descent is not working.

And a theta like this usually means that you should be using learning rate alpha.