So, the first thing that we're going to try to do is really simple.

We're just going to take a a raw average or the mean of the data set.

And so as you can probably guess every inch is going to get the same exact

prediction. We're literally just going to take the

mean of all the values, right? So we'll add them up and divide by the

total number. Sounds really simple, right?

And so, every entry's going to get the same prediction.

We'll do that right now really quickly. We just add 5, 4, 2, 4, 4, and 3, 2, 2,

4, 5, 3, 5, 1, 4, 4, 4, 4, 3, 2 and 5. And we add them and divide by the total

number. We have 20 values there.

And so remember, again, we're not including the test set.

And obviously, we're not including the values that we don't have in that.

So this, we're just training. We're just using the training data to try

to develop our predictor. So we add 5 and 4 and 2 and 4 and 4 and

we keep adding those up. And then we divide by the total number

which is 20, and we get 70 over 20, if you add that up, which is 3.5.

And so, what that really means is then if the mean or the raw average prediction

here is that everyone would rate the movies, we're going to say is a 3.5 star,

right? So all these values get a 3.5, the

unknown ratings, we say, okay, those will be 3.5s, the, test that data we say

those'll be 3.5s and so on. So now, let's try to evaluate the RMSE,

right, because we just did, filled out the tables.

Let's see how good we did, right? So we have to figure out how to do the

RMSE and we'll walk through this now step by step.

and just just so you know this is, this is the most math that we will do in this

course. in the, I think in one of, in the page

rank lecture, we also did a little bit with simultaneous equation, but this is

the most complicated it will get from here on out.

the next three lectures we'll do, will do no math just we'll be back with this

arithmetic very simple arithmetic but here, just bare with us it will be, it

will be fine and we'll explain everything step by step and it'll make sense.

So first thing is we need to, when we do the RMSE, this is root-mean-squared

error, so we have to work backwards, right?

we're going to work our way back out to the root here so we're going to start

with squaring the errors. So the first thing we have to do is find

the errors or the differences, right? So, we're going to do this for the test

set. We'll illustrate this for the test set.

when we find the differences, we just have to subtract the values, right?

So we take 4, and we subtract the prediction value of 3.5.

So 4 minus 3.5, okay? we do this for this 2 right here and we

have another value of 3.5, so we have 2 minus 3.5.

We do 5 minus 3.5, so there's 5 minus 3.5.

we do 3 minus 3.5, so we have 3 minus 3.5.

And we have finally again 4 minus 3.5, so 4 minus 3.5.

And so, now if we, we can just fill in what that is, right, this is, this is

going to be positive 0.5, this is going to be negative 1.5.

The positive and negatives don't matter, as we'll see in a minute, but this is 5

minus 3.5 is going to give us 1.5. This is, so these are just equals over

here. And 3 minus 3.5 is going to give us

negative 0.5. And 4 minus 3.5 is going to give us

positive 0.5. So now, we need to square the differences

actually and before we take the mean we'll square the differences, okay?

So then we take these values and we just square them all.

So, 0.5 squared is 0.25, negative 1.5 squared is 2.25.

Again, the positives and negatives don't matter because we're squaring the values.

And if you don't know what squaring is, it's really just taking a number and

multiplying it by itself, right? So, 3 squared is 3 times 3 which is 9, 4

squared is 4 times 4 which is 16, so on. And so this is really 0.5 times 0.5 which

gives us 0.25. This is negative 1.5 times negative 1.5,

then that becomes positive. So again, when you square something, it

just becomes positive. This, we're going to square and we're

going to get 2.25. And this, we're going to square and we're

going to get 0.25. And this, we'll square again and we'll

get 0.25. All right.

So now, we've squared the differences. Now, we have to take the mean.

So, all these values right here, we want to find the mean of each of those

values. So now, we're up to the mean part.

So we did the square root error, now, we're finding the mean.

So we have 0.25, 2.25, 2.25, 0.25, and 0.25.

So we have to add them up and divide by the total number, so there's five of

them. So we do 0.25 plus 2.25 plus 2.25 plus

0.25 plus 0.25 and we divide that by 5 to get the mean.

And so, when we do this out, we get 5.25 over 5 which is 1.05 as the mean, okay?

And now, finally, we have to do root. So now, remember we squared these values

to begin with. So now, we take this square root of the

whole, the whole deal, this 1.05 again to get the final error.

The, so when we take the square root of 1.05, we have square root 1.05 and that

gives as 1.0247. So what that's saying is, if we multiply

1.0247 by 1.0247, we'll get 1.05, that's what its square root, it's the absolute

square. So, now, that's the RMSE on the test set

and that's how we compute it and we get 1.0247.

So, for the test set, we'll just write down here 1.0247.

And then, similarly, we could do it also on the training set, right, so, on the

values that we do now. So we just did this for these values that

we weren't using the predictor, then how about the ones that we actually did use,

the, the remaining 20? And that will, yeah, be much more tedious

to do by hand so we won't do that here, but you can do that out and you get

1.1619. Let me do that.

So, you just basically, you'd rather than taking the mean over 5 values, you'd be

taking the mean over 20 values, which is a lot.

But, if you haven't already guessed so this is really great to illustrate the

idea of how you do RMSE and it's the same, regardless of what prediction

method you're using but this is a really lazy predictor.

So this is this is very simple scheme. We're not taking anything into account,

no biases no nothing, no user to user similarities, or anything like that.

We're just taking a simple mean of everything, a raw average.

And so we need to take more things into consideration in order to get this RMSE

on the test set in the training set to go down.