We have data points here.

Then we try to use

this line to approximate the function we want to learn.

Then, we have some errors.

Right?. So, that's why we write this in terms of Y minus,

it's the left-hand side minus

the right-hand side there's a error.

So how do we do?.

So we just say we have error now,

so we need to make sure whatever we do,

the error should be minimized.

How can we do that?. Well, the data sets that we have.

So now we have propersumer approximation.

In this case, I have so many data points.

But I want to find a simple model

to explain all of these data points.

In this case, the simplest

one I can say Y equals Beta zero.

So Beta zero now it is a parameter I want to estimate.

So I just to move along this Y axis.

Right?. I can do this.

It's very good that you see for all of

these data points errors is

small but not good for this error points.

So if I do this here then the problem is reversed.

Then good for these data points

not good for this data points.

So obviously there is an optimal data line.

Somewhere here, I can minimize the total number of errors.

So, this is like approximation

I tried to minimize the error.

Error equals Y minus beta zero.

More complicated case. In this case,

I have also these data points.

Obviously, I cannot use