[MUSIC] So, now we're gonna describe what the variant of this coordinate descent algorithm looks like in the case of lasso. Again, we're gonna be looking at these normalized features. And just remember this is where we left off with our coordinate descent algorithm for just least squares un-regularized regression. And remember the key point was that we set w hat j equal to row j. This correlation between our features and the residuals from a prediction, leaving j out of the model. Well in the case of lasso what we're gonna do is, how we set w hat j is gonna depend of the value of our tuning parameter lambda. And how that relates to this rho j correlation term. So, in particular if rho j is small, if it's in this minus lambda over 2 to lambda over 2 range, where again what small means is determined by lambda. What we're gonna do is we're gonna set that w hat j exactly equal to zero. And here we see the sparsity of our solutions coming out directly here. But in contrast, if rho j is really large or on the flip side very small, what that means is that the correlation is either very positive or very negative. Then we're gonna include that feature in the model. Just like we did in our least squares solution but relative to our least squares solution, we're gonna decrease the weight. So, in the positive case if we've a strong correlation rho j. Instead of putting w hat j equal to rho j, we're gonna set it equal to rho j minus lambda over 2. And on the negative side, we're going to add lambda over 2. So let's look at this function of how we're setting w hat j visually. Okay, well this operation that we are performing here in these lasso updates is something called soft thresholding. And so, let's just visualize this. And to do this we're gonna make a plot of rho j, that correlation we've been talking about, versus w hat j, our coefficient that we're setting. And remember, in the least squared solution, we set w hat j equal to rho j for least squares. And we can see that she's setting lambda equal to zero. Remember, lambda equals zero returns us to our least squares solution, so I'll specifically write least squares there. So that's why we get this line y equals x, this green line appearing here. So this represents as a function of rho j how we would set w hat j for least squares. And in contrast this fuchsia line here we're showing is for lasso. And what we see is that in the range minus lambda over 2 to lambda over 2. If this correlation is within this range, meaning that there's not much a relationship between our feature and the residuals from predictions without feature JNR model, we're just gonna completely eliminate that feature. We're gonna set it's weight exactly equal to 0. But if we're outside that range we're still gonna include the feature in the model. But we're gonna shrink the weight on that feature relative to the least square solution by an amount lambda over 2. So this is why it's called soft thresholding, we're shrinking the solution everywhere, but we're strictly driving it to zero from minus lambda over 2 to lambda over 2. And I just want to mention to contrast with, let me choose a color that we don't have here, I guess red will work. I wanna contrast with the ridge regression solution where you can show, which we're not going to do here, but you can show that the ridge regression solution. Shrinks the coefficients everywhere, but never strictly to zero. So this is the line w hat ridge. And, Let me just write that this is w hat lasso. Okay, so here we got a very clear visualization of the difference between least squares, ridge, and lasso. [MUSIC]