案例学习：预测房价

Loading...

案例学习：预测房价

Linear Regression, Ridge Regression, Lasso (Statistics), Regression Analysis

4.8（4,219 個評分）

- 5 stars3,431 ratings
- 4 stars668 ratings
- 3 stars70 ratings
- 2 stars18 ratings
- 1 star32 ratings

Apr 07, 2016

This is an excellent course. The presentation is clear, the graphs are very informative, the homework is well-structured and it does not beat around the bush with unnecessary theoretical tangents.

Jan 02, 2017

This course is great. Things are very clearly explained. I am particularly happy because it helped me to understand many mathematical concepts. I will try not to be scared about formulas anymore.

從本節課中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

#### Emily Fox

Amazon Professor of Machine Learning#### Carlos Guestrin

Amazon Professor of Machine Learning

[MUSIC]

So finally, I just wanted to present the coordinate descent algorithm for

lasso if you don't normalize your features.

So this is the most generic form of the algorithm,

because of course it applies to normalized features as well.

But let's just remember our algorithm for our normalized features.

So, here it is now.

And relative to this,

the only changes we need to make are what's highlighted in these green boxes.

And what we see is that we need to precompute for each one of our features.

This term is Zj, and that's exactly equivalent to the normalizer that we

described when we normalized our features.

So if you don't normalize, you still have to compute this normalizer.

But we're gonna use it in a different way as we're going through this algorithm.

Where, when we go to compute roh j, we're looking at our unnormalized features.

And when we're forming our predictions, y hat sub i, so our prediction for

the ith observation, again, that prediction is using unnormalized features.

So there are two places in the rho j compuation where you would need to

change things for unnormalized features.

And then finally when we're setting w hat j according to the soft thresholding rule,

instead of just looking at roh j plus lambda over two,

or roh j minus lambda over two, or zero.

We're gonna divide each of these terms by z j, this normalizer.

Okay, so you see that it's fairly straight forward to implement this for

unnormalized features, but the intuition we provided was much clearer for

the case of normalized features.

[MUSIC]