The idea is we get to choose our parameters theta 0, theta 1 so

that h of x, meaning the value we predict on input x,

that this is at least close to the values y for

the examples in our training set, for our training examples.

So in our training set, we've given a number of examples where we know X

decides the wholes and we know the actual price is was sold for.

So, let's try to choose values for the parameters so that,

at least in the training set, given the X

in the training set we make reason of the active predictions for the Y values.

Let's formalize this.

So linear regression, what we're going to do is,

I'm going to want to solve a minimization problem.

So I'll write minimize over theta0 theta1.

And I want this to be small, right?

I want the difference between h(x) and y to be small.

And one thing I might do is try to minimize the square difference

between the output of the hypothesis and the actual price of a house.

Okay. So lets find some details.

You remember that I was using the notation (x(i),y(i))

to represent the ith training example.

So what I want really is to sum over my training set,

something i = 1 to m,

of the square difference between, this is the prediction

of my hypothesis when it is input to size of house number i.