案例学习：预测房价

Loading...

From the course by University of Washington

机器学习：回归

3519 ratings

案例学习：预测房价

From the lesson

Simple Linear Regression

Our course starts from the most basic regression model: Just fitting a line to data. This simple model for forming predictions from a single, univariate feature of the data is appropriately called "simple linear regression".<p> In this module, we describe the high-level regression task and then specialize these concepts to the simple linear regression case. You will learn how to formulate a simple regression model and fit the model to data using both a closed-form solution as well as an iterative optimization algorithm called gradient descent. Based on this fitted function, you will interpret the estimated model parameters and form predictions. You will also analyze the sensitivity of your fit to outlying observations.<p> You will examine all of these concepts in the context of a case study of predicting house prices from the square feet of the house.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So this is gonna be our Approach 1.

And this is drawn here on this 3D mesh plot,

where that green surface is the gradient at the minimum.

And what we see is that's where the gradient = 0.

And that red dot is the, the optimal point that we're gonna be looking at.

Okay, so let's go ahead.

Take this gradient, set it equal to zero.

Solve for W0 and W1.

Those are gonna be our estimates

of our two parameters of our model that define our fitted line.

Remember, that's our goal.

Okay, so I'm gonna take the top line and I'm gonna do a little bit of algebra.

I'm gonna do it quickly.

And I'm gonna assume that you, if you would like to,

can go through and verify that what I did is correct.

But I'm gonna take the first line, and I'm going to set it equal to 0.

The top line, when you set it

equal to 0, results in W hat 0,

is equal to the sum of yi over N

minus W1 hat, sum of Xi over N.

And these sums go from i equal one to N, just as they did here.

And the reason I'm putting the hats on are now, these are our solutions.

These are our estimated values of these parameters.

And what we see is that our estimate of the intersect for

our regression line.

Well it takes, what is this?

This is our average house sales price.

But we're not simply gonna set W0 equal to the average house sales price,

we're gonna subtract off our estimate of the slope, of the line.

That's W hat 1.

And what is this term here?

That's multiplying W hat 1.

Well, this is our average

Square feet of any house in our training data set.

Okay so, there's a nice intuitive structure to our estimate for W hat zero.

But again this is in terms of W hat 1, so we have to provide

another equation to actually get at a solution, and so

if we look at the bottom line of this gradient the bottom term of this vector,

shouldn't call it line I guess I'll call it the top term

of the gradient and this is the bottom term of the gradient.

If we solve, set it equal to 0,

we're gonna get some of yiXi-w hat sum Xi minus W,

sorry, this should be W0 hat,

W1 hat sum Xi squared = 0.

And now what I'm gonna do is I'm gonna take W0 hat,

my equation for it and I'm gonna plug it in.

And so what I end up getting out, is that W1 hat,

once I Plug W0 hat in, in terms of W1 and solve for W1 hat.

I get W1 hat is equal to the sum of YiXi,

minus sum Yi, sum Xi, over N,

divided by sum Xi squared,

minus sum Xi, sum of Xi divided by N.

Okay.

Anyway, the point is that it has a close form pretty straightforward to go and

compute what this is, and what we see and wanna note

that what we have to compute to compute W hat 1 and then plug that in and

compute W hat 0 is we need to compute just a couple terms.

We need to compute, sum over all of our observations Yi,

we need to compute our outputs,

Yi sum over all of our inputs Xi, and

then a few other terms that are multipliers of this,

of our input and output.

So we need to compute just four different terms.

Plug them into these equations and we get out what our W hat 0 and W hat 1 is.

The optimal values that are minimizing our residual sum of squares.

The take home message here is that, one way we can solve this optimization problem

of minimizing residual sum of squares, take the gradient set it equal to zero and

this is the result.

[MUSIC]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.