And we talked a couple of segments ago about various ways to do this.

But here we're just going to train on one half and test on the other.

And further, I've scaled the data, so

that the original domain here of -10 to 60, or really, 0 to 60.

And some kind of range of 0 to 600,

is scaled down to variation around 0.

This is actually a little suspect what I did here, because I scaled the data all at

once and then split it into training and test data, and so the test data sort of

influenced how the training data was scaled, which is typically a no-no.

Okay, so you may want to scale them sort of separately.

So how does grading descent work?

In this plot what we have is the two parameters for

the regression line which are theta zero and theta one.

And there's the y intercept and the slope, essentially, right?

These are the two parameters for a single line.

This is the equation down here.

I refers to the iteration number, okay?

So this is really iteration zero that we're looking at or

really difference between iteration zero and one.

So we started this process off at this point, and we pick that point randomly,

and there's various deterministic ways to choose a starting point, but

in general, the fact that you have to decide a starting point is

one of the weaknesses of gradient descent, okay.

So we start off at a single point, and

then we found the direction of steepest descent and took a step in that direction.

So the cost function here that we're trying to minimize, this says

that the response variable y minus the function applied to the input variable,

and here the function is just a linear regression of intercept plus slope,

take that difference and square it and add them all up for all the data points.

Right, that's the total amount of error that you received by trying to

explain all the data with this particular regression line that we started with,

this first starting point.

And so we compute that gradient, and

I haven't shown on this slide how to do that.

But you complete that gradient and jump down quite a bit, right, the error goes

down a lot in this first step as we walk from here to here in this parameter space.

So now we've gone from one regression line to another regression line.

Here's the first regression line that we started with, and

we just kind of got lucky that it already is kind of in the right direction.

We may not have. It could have gone this way, all right.

So the next slide, we take another step, and the error goes down a little bit more.

And we take another step towards the minimal over here, toward the center.

And we've got another regression line that has rotated slightly.

So it's closer to what we intuitively think will be describing the data,

and we can keep going with this.

And keep stepping down and the regression line gets better and better and better,

the error goes down, right, we're finding a local minimum in that error