And this really forms the basis for what we call the baseline predictor.
And the baseline predictor. And what we do is we take each of these
rating values or we, we start with the average actually and we take we take that
raw average, remember last, from last time that was just the 3.5 then we add in
a bias for the user and we add in a bias for the movie.
So the biases are going to be positive if it's better received or if you tend to be
a more lenient critic. And they're going to be negative if it's
a movie that's, tends to be worse or if the critic tends to be more harsh.
And so we have to find the bias for each of these users.
And that's, that's the whole idea here, is that we really need to to find those
those bias values. And so finding the bias is, is somewhat
of an art actually. there, there is a quote unquote right way
to do it probably which would be that we have to solve a complicated optimization
problem. [SOUND] Optimization right over here and
but we're not dealing with any calculus or linear algebra in this course.
We won't go that far, we won't deal with that.
We will just look at a simpler approach to doing this.
And so we need to find the bias values so we'll try to do something intuitive.
Why don't we just compare it to the mean value?
Right, so let's just compare it to what the mean was.
Again, remember the mean at the overall data set was 3.5 so let's see how much
higher or lower the mean is than 3.5, and that's how we find the bias.
So we take the average, of the lower column.
We'll start with, maybe we'll start with the harsh critic in D, so we'll take.
2 plus 3 plus 1 plus 2 is the average of these values 2, 3, 1 and 2.
Again remember we're not including anything that is in the test set when we
do these biases out. Divide that by 4, just for those values,
and that is 8 divided by 4, which is 2. But then that we actually have to
subtract out the mean because we want to do it relative to the overall mean, so we
do 2 minus 3.5. So we've just can write up here this is
really minus 3.5, this is minus 3.5. So we get 2 minus 3.5 which is negative
1.5. So that's the bias and that should make
sense because D is a harsh critic so he's well below the average rating of minus
1.5. He was below the average of 3.5.
And now let's try the good movie, or movie three, so we can take the average
of the values again. So this is for D.
Now we'll try three. So, you, we add up 4 plus 5 plus 3 plus 5
and again, there's five of those, there's four, sorry, there's four of those
values. this we don't have and this is in the
test sets, we don't use it. So this comes out to be 17 over 4, which
is, 4.25. And then remember we subtract 3.5 from
all of these values, minus 3.5, minus 3.5.
And this is 4.25 minus 3.5, which gives us positive 0.75.
So, for D we have a negative value of negative 1.5 and for 3 we have positive
of 0.75 which is significantly above zero, or it's above zero reasonably
enough, which is what we expected, because we expected that 3 was a much
better movie than the others. And so the key idea here, again though,
is that you can't use the test set. So the test set has to go.
And because we're not using that in these prediction schemes because we're, that's
what we're going to test the RMSE on. And so we can do the rest of the values
out and I've do note, I've given the values at the end of the columns or rows
here. So the bias for A is positive 0.83, for B
is 0.5, for C is negative 1. really easy to see we could do C is, for
instance, right now there's only two values, there's a two and a three here,
so we just do 2 plus 3 divided by 2 minus 3.5, which is 5 divided by 2, which is
2.5 minus 3.5 which is negative 1.0. And you can verify the rest of these
also. and then on the movie side, again, two is
pretty negative movie actually. 3 plus 2 plus 2 divided, divided by 3,
and then, subtract out, the mean of 3.5 from that.