案例学习：预测房价

Loading...

From the course by University of Washington

机器学习：回归

3522 ratings

案例学习：预测房价

From the lesson

Nearest Neighbors & Kernel Regression

Up to this point, we have focused on methods that fit parametric functions---like polynomials and hyperplanes---to the entire dataset. In this module, we instead turn our attention to a class of "nonparametric" methods. These methods allow the complexity of the model to increase as more data are observed, and result in fits that adapt locally to the observations. <p> We start by considering the simple and intuitive example of nonparametric methods, nearest neighbor regression: The prediction for a query point is based on the outputs of the most related observations in the training set. This approach is extremely simple, but can provide excellent predictions, especially for large datasets. You will deploy algorithms to search for the nearest neighbors and form predictions based on the discovered neighbors. Building on this idea, we turn to kernel regression. Instead of forming predictions based on a small set of neighboring observations, kernel regression uses all observations in the dataset, but the impact of these observations on the predicted value is weighted by their similarity to the query point. You will analyze the theoretical performance of these methods in the limit of infinite training data, and explore the scenarios in which these methods work well versus struggle. You will also implement these techniques and observe their practical behavior.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So, now let's look at neighbors in practice.

We're gonna look at exactly the same data set we looked at for

one neighbors where we show that really noisy fit.

And what we see is that things look a lot better here.

So, in this yellow box What we're showing are all the nearest neighbors for

a specific target point x zero.

So, there's a red line going up from our target, our quarry point,

to a yellow box and this yellow box has all the nearest neighbor observations

highlighted as red circles instead of grey circles For this one query point.

And if we think about averaging the values of all these points, that results

in the value of the green line at this target point x0.

And we can repeat this for every value of our input space, and

that's what's gonna give us this green curve.

And so what we see here is that this fit looks much more reasonable

than that really noisy one nearest neighbor fit we showed before.

But, one thing that I do want to point out is that we get these boundary effects and

the same is true if we have limited data in any region of the input space but

in particular at the boundary the reason that we get these constant fits

is the fact that our nearest neighbors are exactly the same set of points for

all these different input points.

Because if I'm all the way over at the boundary,

all my nearest neighbors are the k points to either the right or

left of me depending which boundary I'm at.

And then if I shift over one point I still have

the same set of nearest neighbors obviously accept for

the one point that is the query point but aside from that its basically the same set

of values that you're using at each one of these points along with the boundary but

overall we see that we've been able to cope with some of the noise

that we had in the one nearest neighbor situation a lot better than we did before.

But beyond the boundary issues, there's another fairly important issue with the K

nearest neighbors fit, which is the fact that you get discontinuity.

So, if you look closely at this green line,

it's these, a bunch of jumps between values.

And the reason you get those jumps is the fact that as you shift from one input

to the next input, a nearest neighbor is either completely in or out of the window.

So, there's this effect where all of a sudden a nearest neighbor changes, and

then you're gonna get a jump in the predicted value.

And so the overall effect on predictive accuracy

might actually not be that significant.

But there's some reasons we don't like fits with these types of discontinuities.

First, visually maybe it's not very appealing.

But let's think in terms of our housing application,

where what this means is that if we go from a house, for example,

2640 square feet to a house of 2641 square feet.

To you, that probably wouldn't make much of a difference in assessing the value but

if you have a discontinuity between these two points what it means is there's a jump

in the predicted value.

So, I take my house as sum predicted value I just add one square feet and

predicted value would have perhaps significant increase or decrease.

And so that is not very attractive in the applications like housing.

And more generally we just don't ten do believe these types of fits.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.