案例学习：预测房价

Loading...

From the course by 华盛顿大学

机器学习：回归

3508 ratings

案例学习：预测房价

From the lesson

Ridge Regression

You have examined how the performance of a model varies with increasing model complexity, and can describe the potential pitfall of complex models becoming overfit to the training data. In this module, you will explore a very simple, but extremely effective technique for automatically coping with this issue. This method is called "ridge regression". You start out with a complex model, but now fit the model in a manner that not only incorporates a measure of fit to the training data, but also a term that biases the solution away from overfitted functions. To this end, you will explore symptoms of overfitted functions and use this to define a quantitative measure to use in your revised optimization objective. You will derive both a closed-form and gradient descent algorithm for fitting the ridge regression objective; these forms are small modifications from the original algorithms you derived for multiple regression. To select the strength of the bias away from overfitting, you will explore a general-purpose method called "cross validation". <p>You will implement both cross-validation and gradient descent to fit a ridge regression model and select the regularization constant.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Well, we've motivated analytically how the coefficients that we get when solving

this ridge regression problem are gonna change for different settings of lambda.

Specifically, we saw that when lambda was 0, we get our least square solution.

When lambda goes to infinity, we get very, very small coefficients approaching 0.

And in between, we get some other set of coefficients and

then we explore this experimentally in this polynomial regression demo.

But one thing that's interesting to draw is what's

called the coefficient path for ridge regression.

Which shows as you vary lambda, all the way from 0 up

towards infinity, how do the coefficients change?

So how does my solution change as a function of lambda?

And what we're doing in this plot here is we're drawing this for

our housing example, where we have eight different features.

Number of bedrooms, bathrooms, square feet of the living space,

number of square feet of the lot size.

Number of floors, the year the house was built, the year the house was renovated,

and whether or not the property is waterfront.

And for each one of these different inputs to our model are different, and

these we're just gonna use as different features,

we're drawing what the coefficients, so this would be,

Coefficient value for

square feet living.

For some specific choice of lambda and how that coefficient varies as I increase

lambda and I'm showing this for each one of the eight different coefficients.

And I just want to briefly mention that in this figure, we've rescaled the features

so that they all have unit norm so each one of these different inputs.

That's why all of these coefficients are roughly on the same scale.

They're roughly the same order of magnitude.

Okay, and so what we see in this plot is, as lambda goes towards 0,

or when it's specifically at 0, our solution here.

The value of each of these coefficients, so each of these circles

touching this line, this is gonna be my w hat least squares solution.

And as I increase lambda out towards infinity,

I see that my solution, w hat, approaches 0.

There's a vector of coefficients is going to 0.

And we haven't made lambda large enough in this plot to see them actually really,

really, really, really close to 0, but you see the trend happening here.

And then there's some sweet spot in this model, sorry not in this model,

in this plot.

Which we're gonna talk about later in this module.

Whoops, I should draw it actually hitting some of these circles.

One of these considered points.

So this is gonna represent, erase this,

this is gonna represent some lambda star.

Which will be the value of lambda that we wanna use when we're selecting

our specific regularized model to use for forming predictions.

And we're gonna discuss how we choose which lambda to use later in the module.

But for now, the main point of this plot is to realize that for

every value of lambda, every slice of this plot,

we get a different solution, a different w hat vector.

[MUSIC]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.