案例学习：预测房价

Loading...

來自 University of Washington 的課程

机器学习：回归

3650 個評分

案例学习：预测房价

從本節課中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

For any specific value of lambda,

we get some balance between this residual sum of squares, and this two norm.

And so what I'm gonna do,

is in this movie, I'm gonna add these two contour plots together.

I'm gonna add, so let me write this down.

Add contour

plots together,

where I'm getting residual sum of squares of w plus lambda 2 norm of w.

Where here the residual sum of squares were these ellipses,

centered about my least squares solution, and there's to norm,

where these circles centered about zero.

And lambda is some weighting on how much I'm including

that two norm penalty or the cost.

And what I'm going to do is I'm going to show a movie as a function of lambda.

So movie, function of

increasing lambda,

where I have my ellipses, and I'm weighting more and

more heavily these contours that are coming from the circle.

The circle terms from this two norm penalty.

Okay, so this is the movie right here, and

my lovely assistant Carlos, will click the mouse to play the movie.

[LAUGH] Since I don't know how to control it from

the tablet, unfortunately.

[LAUGH] Thank you Fanna.

That reference was probably lost on most people.

And in doing so, you didn't get me describing the movie so

let's watch it again.

But what we see this x, let me be clear,

that the x is going to mark the optimal solution,

For a specific lambda and

we're varying lambda so this x is gonna move.

Okay where's the x gonna start?

Well when lambda's equal to zero, we're starting out our lee squared solution and

as lambda increases we know that as lambda goes to infinity

the coefficients are gonna shrink to zero.

But let's visualize the path that it takes as we increase lambda.

Okay so let's play this movie again, gonna start it early square the solution and

we see that the magnitude of our coefficients W0 and

W1 are shrinking smaller and smaller towards zero.

So, maybe we'll play that once more just to visualize this and what we say.

Again, this was just the tail end of the movie.

Is this shrinking magnitude of the coefficients?

Carlos is very excited about this movie, so we're gonna watch it one more time.

It's pretty cool.

We've never actually seen somebody do this visualization.

We think it's really intuitive.

So again, as that land of penalty is increasing,

the magnitude of the coefficients are getting shrunk.

Okay, well now let's talk about what the solution looks like for

a given value of lambda.

Oops, sorry let me turn my pen on.

So for a specific lambda value.

We have some balance between residual sum of squares and

the magnitude of our coefficients.

Lambda's automatically doing some trade-off between the two.

So some balance

Between RSS and our two norm.

And specifically, in this plot, this is our solution.

So, it has some RSS that happens to be, five thousand two hundred fifteen,

that's what the number on this contour is indicating and it has some tune arm,

which has value 4.75, and so

this lambda has chosen the specific trade off and we see that our solution

is somewhere here, which has shrunk from where our least square solution was.

Let's remember our least square solution was somewhere around here.

And the optimal for lambda equals infinity was that zero.

So it's somewhere in between these two values.

And if we had chosen a different value of lambda, let's say a larger value of

lambda We would of had a different solution.

And when I'm drawing all these contours, what I'm saying is,

let me just go back to the original one before this this drawing.

What I'm saying is, every other point along this circle,

has exactly the same residual sum of squares.

But larger l2 norm of w and everywhere

along this circle has exactly the same w2 norm,

sorry, l2 norm of w, but it has larger residual sum of squares.

So that's why this is the optimal trade off for this lambda.

Then, like I drew here, if I chose a larger lambda,

I will get a solution that preferred a smaller two norm and

a larger residual sum of squares.

So, this would be solution for

a larger lambda value.

Okay, so this is just a little visualization of what a ridge regression

solution looks like.

[MUSIC]