案例学习：预测房价

Loading...

來自 University of Washington 的課程

机器学习：回归

3810 個評分

案例学习：预测房价

從本節課中

Simple Linear Regression

Our course starts from the most basic regression model: Just fitting a line to data. This simple model for forming predictions from a single, univariate feature of the data is appropriately called "simple linear regression".<p> In this module, we describe the high-level regression task and then specialize these concepts to the simple linear regression case. You will learn how to formulate a simple regression model and fit the model to data using both a closed-form solution as well as an iterative optimization algorithm called gradient descent. Based on this fitted function, you will interpret the estimated model parameters and form predictions. You will also analyze the sensitivity of your fit to outlying observations.<p> You will examine all of these concepts in the context of a case study of predicting house prices from the square feet of the house.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Now we're gonna discuss an important issue of the influence of what are called high

leverage points.

And these are points that can be considered influential observations.

But to have this discussion, I think it's really useful to just look at some data.

So to start with, let's fire up graphlab and then let's load some data.

And for this, we're gonna load our data into our SFrame.

And I'm gonna assume that you guys are familiar with a lot of what I'm doing here

from the foundations course where we went through pretty slowly,

a lot of the graphlab related code that we're seeing here.

And I wanna emphasize that throughout this course you'll actually learn how to

implement these methods, but for the sake of this demo and other demos in

this course, we're gonna just use graphlab Create to keep the discussion

at a much higher level about the concepts that we're trying to convey.

Okay, so the data set that we're looking at here In this example is

a Philadelphia housing data set, where in particular,

our data set consists of the average house price in

a whole collection of towns in the greater Philadelphia region.

And we also have information about crime rates in each one of these towns.

As well as how far that town is from Center City and

Center City is the downtown region of Philadelphia.

So, let's start analyzing this data.

And to do this, what we're going to start with is just making a scatter plot

of what's the relationship between average house sales prices, and crime rates.

Okay, so here we are, we're gonna do just

a .show command to show a scatter plot of,

on the x axis we have crime rate.

And each one of these little blue circles or cyan,

light blue circles is a different town in our dataset.

And we have a total of 98 different towns.

And on the y axis what we have is the average house value in that town.

Okay.

And so, from this you can see that there's some relationship between

our crime rate and our house sales price.

In particular, we see that for towns that have lower crime rates,

they tend to have higher house values and vice versa.

So that makes a lot of sense.

So let's try and actually fit a relationship between crime rate and

house price.

So we're gonna go through and fit this regression model doing our standard

dot linear regression command, taking out target, or

our output, to be that house price in that region, and

taking our features to just be a single feature, which is crime rate in that town.

Okay, so now what we've done is we've output this to something called

crime underscore model and now let's look at what this fit resulted.

So I just Import our map plot

library to start making some plots here, and

what we're gonna show is we're gonna show a plot of

the observations that we showed before as well as our fitted line.

So this is our fitted simple linear regression model is this green line.

So these are our predictions of house values for each crime

rate going from 0 up to somewhere around, I don't know, 360 or something like this.

So we do see a trend where house value

decreases as crime increases, the slope of this line is negative.

But one thing that we pretty immediately see is there's an observation out here,

there's this blue dot which has extremely high crime rates but

the house value is I mean it's low-ish, but it's not as

low as the house values in other regions that have significantly lower crime rates.

And we see that our line, our fitted line,

is getting pulled towards this observation that's all the way out here.

So it's being, at least it looks like from the picture,

being heavily influenced by this one observation.

And that's really, really far on the x axis.