案例学习：预测房价

Loading...

From the course by University of Washington

机器学习：回归

3520 ratings

案例学习：预测房价

From the lesson

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So we've gone through the coordinate descent algorithm for

solving our lasso objective for a specific value of lambda,

and that begs the question well how do we choose the lambda tuning parameter value?

Well, It's exactly the same as in ridge regression.

If we have enough data, we can think about holding out a validation set and

using that to choose amongst these different model complexities lambda.

Or if we don't have enough data, we talked about doing cross validation.

So these are two very reasonable options for choosing this tuning parameter lambda.

But in the case of lasso, I just want to mention that using these types of

procedures, assessing the error on a validation set or doing cross validation,

it's choosing lambda that provides the best predictive accuracy.

But what that ends up tending to do is choosing a lambda value that's a bit

smaller than might be optimal for doing model selection,

because for predictive accuracy having slightly less solutions

can actually lead to a little bit better predictions on any finite data set,

than possibly the true model with the sparsest set of features possible.

So instead, there are other ways that you can choose this tuning parameter lambda

and I'll just refer you to other texts like this textbook by Kevin Murphy,

Machine Leaning A Probabilistic Perspective for

further discussion on this issue.

So let's just conclude by discussing a few practical issues with lasso.

The first is the fact that, as we've seen in multiple different ways throughout this

module, lasso shrinks the coefficients relative to the least square solution.

So what it's doing is increasing the bias of the solution in exchange for

having lower variance.

So this is doing this automatic bias variance tradeoff but

we might wanna still have a low bias solution, so

we can actually think about reducing the bias of our solution in the following way.

This is called debiasing the lasso solution, where we run our lasso solver

and we get out a set of selected features, so those are the features whose weights

were not set exactly to zero, and then what we do is we take that reduced model.

The model with these selected features and

we run just standard lease squares regression on that reduced model.

And in this case, what happens is these features that were deemed relevant to our

task, their weights after doing this debiasing procedure

will not be shrunk, relative to the weights of

a least square solution if we had started exactly with that reduced model.

But, of course, that was the whole point, we didn't know which model, so

the lasso is allowing us to choose out this model, and

then just run least squares on that model.

So these plots show a little illustration of the benefits of debiasing.

So the top figure shows the true coefficients for data,

so it's generated with 4,096 different

coefficients or different features in the model, but

only 160 of these had positive coefficients associated with them.

So it's a very sparse setup and if you look at the L one reconstruction,

that's the second row of this plot, you see that it's discovered

1,024 features that have non zero weights,

has mean squared error of 0.0072,

but if you take those 1,024 non zero

weight features and just run least squares regression on them, you get the third row.

And that has significantly, significantly, lower mean square but in contrast,

how do you run least squares on the full model with 4,096 features?

You would get a really, really poor estimate of all that's going on and

a very large mean square there.

So this shows the importance of doing both lasso and

possibly this debiasing on top of that.

Another issue with lasso is, if you have a collection of strongly correlated

features, lasso will tend to just select amongst them pretty much arbitrarily.

And what I mean is that,

a small tweak in the data might lead to one variable included, whereas

a different tweak of the data would have a different one of these variables included.

So we're now housing an application.

Maybe you could imagine that square feet and lot size are very correlated, and

we might just arbitrarily choose between these, but in a lot of cases,

you actually wanna include the whole set of correlated variables.

And another issue is the fact that, it's been shown empirically that in many cases,

rich regression actually outperforms lasso in terms of predictive performance.

So there are other variants of lasso, something called elastic net.

That tries to address these set of issues.

And what it does is, it fuses both the objectives of ridge and

lasso, including both an L one and an L two penalty.

And you can see this paper for further discussion of these and other issues with

the original lasso objective, and how elastic net addresses it.

[MUSIC]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.