We have data points here,

and then we try to use

this line to approximate the function we want to learn.

Then we have some errors.

So that's why we write this in terms of y minus,

you see left-hand side minus

the right-hand side, there's error.

So how do we do?

So we just say, "We have an error now."

So we need to make sure whatever we do,

the error should be minimized,

and how can we do that for the Data set.

So now we have some approximation.

In this case, I have so many data points,

but I want to find a simple model

to explain all these data points.

In this case, the simplest one I can say y equals Beta 0.

So Beta 0 now it is a parameter I want to estimate.

So I just to move along this y-axis, I can do this.

It's very good that you see for all of these data points,

error is small, but not good for this zero.

So if I do this here,

then the problem is reversed.

Then this data points is not good for these data points.

So obviously there's an optimal data line,

and somewhere here,

I can minimize the total number of errors.

So this is like approximation.

I try to minimize the error.

Error equals y minus Beta 0.

More complicated case. In this case,

I have also these data points.

Obviously, I cannot use