Let's consider a different kind of residual. So consider the model, y = x tilde beta tilde + epsilon, where x tilde = to my traditional design matrix, x, and the vector, delta i. delta i is a bunch of 0s and a 1 only in the ith position, so that's in position i. Okay, and then my beta tilde vector is my normal vector beta, that I almost always have, and then parameter Delta. And let me put a little i there just to indicate that's Delta, the parameter just devoted to the ith data point. Okay, now I could equivalently write my. With my least square criteria, let's figure out what the MLE is when we just add this extra term. I could write this as y- x tilda beta tilda squared. But I could equivalently write that as the summation of (yi- the summation over k, this is the summation over i, summation over k, let's say xik. And let me use i prime just so we're not messing up with the i, the index that we fixed earlier. Times beta k- delta sub i, evaluated at the index i prime times delta i squared. So this is going to be equal for all those elements of the sum that are not equal to i. This vector right here, the elements of this vector are going to be 0, except in the instance when i = i prime. So for i not equal to i prime, this is just going to be summation (yi prime- summation over k xi prime k beta k. And then +, the one instance where i prime equals i, (yi- summation over k, oops, squared, xi prime k beta k- delta i) squared. Now, this is going to be greater than or equal to, if I were to get this term right here equal to 0. But one way I can set a linear constraint and make that term equal to 0 is to set delta i = yi- summation over k xi prime k beta k. So then that term is 0, and then we're left with the term. The summation over all points i prime not equal to i (yi prime- summation over k xi prime k beta k) squared. Well, we know when this is minimized, that's going to be greater than or equal to. If I were to just to plug in my least squares estimate, where in this case, notice the sum is over everything except the ith data points. So if I just plug in my least squares estimate for my vector, x, where I've deleted the i primeth data points. So let me just call that beta ^ k- i, just meaning the least squares estimate where I've deleted the ith data point, then that would be minimized. And then my Delta i ^ is going to be = yi- summation xi prime k beta k- i, over k = 1, to p. So lets look at this term Delta now. So notice the format of this. This is yi- let's say, and ^, yi- i ^. So what do I mean by that? So this the fitted value for the ith data point, where the ith data point wasn't used in the fitting. So there's a couple of interesting points to make out of this. 1, adding a regressor, That's just 1, that's 1 only in the ith data point, Is equivalent, To deleting that data point, From the analysis. So that's an interesting way to delete a data point. And then 2, the coefficient, For that term, The coefficient, Is a leave one out residual. And so this residual, right here. The difference between the ith data point outcome And what you would predict for the ith data point. If the ith data point wasn't allowed to actually influence the model is called a press residual, or a leave-one-out cross-validation residual. The other interesting fact about this is that, first of all, that you can obtain the leave-one-out cross-validation residual by fitting a model with this extra term in. Another interesting fact is that the coefficient table for the Delta i, that t test is a test, sort of an outlier test, for the ith data point. If you need a coefficient that's devoted just for that data point, then that data point is an outlier. And the t test for this is actually a valid t test. So that gives us a form of standardized residual for the ith data point that is t distributed. So there's a lot of interesting facts about this. And one fact that we'll cover later on is that you don't even actually have to fit this model or in any way delete the ith data point in order to obtain these residuals. In order to get to leave-one-out cross-validated residuals, you don't actually have to leave one out. It's a surprising thing in linear models. But these residuals are quite useful for a variety of reasons. One is they have this motivation as an observation-specific mean shift. Two is the cross-validated residuals, like this, have an obvious intuitive interpretation of, well, how different is my outcome than what my model will predict? Where that data point hasn't been allowed to impact the model fitting is a very powerful idea for assessing things like model fit. And then finally, the t-test that you get from fitting this model with this extra term for the ith data point yields a form of standardized residual that's a kind of useful form of standardized residual that is exactly t-distributed. So we could establish cutoffs for thresholding them, and so on. So now we're going to go through some coding examples, and then we're going to delve into these press residuals a little bit more.