案例学习：预测房价

Loading...

From the course by 华盛顿大学

机器学习：回归

3517 ratings

案例学习：预测房价

From the lesson

Nearest Neighbors & Kernel Regression

Up to this point, we have focused on methods that fit parametric functions---like polynomials and hyperplanes---to the entire dataset. In this module, we instead turn our attention to a class of "nonparametric" methods. These methods allow the complexity of the model to increase as more data are observed, and result in fits that adapt locally to the observations. <p> We start by considering the simple and intuitive example of nonparametric methods, nearest neighbor regression: The prediction for a query point is based on the outputs of the most related observations in the training set. This approach is extremely simple, but can provide excellent predictions, especially for large datasets. You will deploy algorithms to search for the nearest neighbors and form predictions based on the discovered neighbors. Building on this idea, we turn to kernel regression. Instead of forming predictions based on a small set of neighboring observations, kernel regression uses all observations in the dataset, but the impact of these observations on the predicted value is weighted by their similarity to the query point. You will analyze the theoretical performance of these methods in the limit of infinite training data, and explore the scenarios in which these methods work well versus struggle. You will also implement these techniques and observe their practical behavior.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So now let's step back and discuss some important theoretical and

practical aspects of K-nearest neighbors and kernel regression.

If you remember the title of this module it was Going Nonparametric, and we've yet

to mention what that means.

What is a nonparametric approach?

Well, K-nearest neighbors and

kernel regression are examples of nonparametric approaches.

And the general goal of a non-parametric approach is

to be really flexible in how you're defining f of x and

in general you want to make as few assumptions as possible.

And the really key that defines a non-parametric method Is that

the complexity of the fit can grow as you get more data points.

We've definitely seen that with K-nearest neighbors and kernel regression,

in particular the fit is a function of how many observations you have.

But these are just two examples of nonparametric methods you might use for

regression.

There are lots of other choices.

Things like splines, and

trees which we'll talk about in the classification course, and locally

weighted structured versions of the types of regression models we've talked about.

So nonparametrics is all about this idea of having the complexity grow with

the number of observations.

So now let's talk about what's the limiting behavior

of nearest neighbor regression as you get more and more data.

And to start with, let's just assume that we get completely noiseless data.

So every observation we get lies exactly on the true function.

Well in this case, the mean squared error of one nearest neighbor

regression goes to zero, as you get more and more data.

But let's just remember what mean squared error is and if you remember from

a couple modules ago, we talked about this bias-variance trade-off and

that mean squared error is the sum of bias squared plus variance.

So having mean squared error go to zero means that both bias and

variance are going to zero.

So to motivate this visually, let's just look at a couple of movies.

Here, in this movie, I'm showing what the one nearest neighbor fit looks like

as we're getting more and more data.

So, remember the blue line is our true curve.

The green line is our current nearest neighbor fit based on some set of

observations that are gonna lie exactly on the true function at that blue curve.

Okay, so here's our fit changing as we get more and

more data and what you see is that it's getting closer, and closer,

and closer, and closer, and closer to the true function.

And hopefully you can believe that in limit of getting an infinite number of

observations spread over our input space this nearest neighbor

fit is gonna lie exactly on top of the true function.

And that's true of all possible data sets of

infinite number of observations that we would get.

In contrast, if we look what just doing a standard quadratic fit,

just our standard least squares fit we talked about before.

No matter how much data we get, there's always gonna be some bias.

So we can see this here, where especially

at this point of curvature we see that this green fit, even as we get lots and

lots of observations, is never matching up to that true blue curve.

And that's because that true blue curve, that's actually a part of a sinusoid.

We've just zoomed in on a certain region of that sinusoid.

And so

this quadratic fit is never exactly gonna describe what a sinusoid is describing.

So this is what we talked about before, about the bias that's inherent

in our model, even if we have no noise.

So even if we eliminate this noise,

we still have the fact that our true error, as we're getting more and more and

more data, is never gonna to go exactly to zero.

Unless of course the data were generated from exactly the model

that we're using to fit the data.

But in most cases, for example, maybe you have data that looks like the following.

So this is our noise list data, and if we can strain our

this was all for fixed model complexity remember,

if we can strain our model to say be a quadratic,

then maybe this will be our best quadratic fit.

And no matter how many observations I give you from this more complicated function,

this quadratic is never gonna have zero bias.

In contrast, let's switch colors here, so

that we can draw our one nearest neighbor fit.

Our one nearest neighbor, as we get more and more data,

it's fitting these constants locally to each observation.

And as we get more and more and

more data, the fit is gonna look exactly like the true curve.

And so when we talk about our true error with increasing number of observations.

Our true error for, this is a plot of

true error for one nearest neighbor,

is going to go to zero for noiseless data.

But now let's talk about the noisy case.

This is the case that we're typically faced with in most applications.

And in this case what you can say is that the mean squared error

of nearest neighbor regression goes to zero.

If you allow the number of neighbors or the k in our nearest neighbor regression,

to increase with the number of observations as well.

Because if you think about getting tons and tons and tons of observations.

If you keep k fixed, you're just gonna be looking at a very,

very, very local region of your input space.

And you're gonna have A lot of noise introduced from that, but if you'll allow

k to grow, it's gonna smooth over the noise that's being introduced.

So let's look at a visualization of this.

So here what we're showing is the same true function we've shown throughout

this module, but we're showing tons of observations, all these great dots.

But they're noisy observations, they're no longer lying exactly on that blue curve.

That's why they're this cloud of blue points.

And we see that our one nearest neighbor fit is very, very noisy.

Okay, it has this wild behavior, because like we discussed before,

one nearest neighbor is very sensitive to noise in the data.

But in contrast, if we look at a large k, so here we're looking at k equals 200.

So our 200 nearest neighbors fit, it looks much, much better.

So you can imagine that as you get more and more observations,

if you're allowing k to grow, you can smooth over the noise being introduced by

each one of these observations, and have the mean squared error going to zero.

But in contrast, again, if we look at just our standard least squares regression

here in this case of a quadratic fit, we're always gonna have bias.

So nothing's different by having introduced noise.

It, if anything, will just make things worse.

[MUSIC]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.