In this lecture, we're going to extend what we did in the previous lecture to complete our implementation of a fully-fledged latent factor model. Again, this is a fairly complex implementation to get your head around, but it's going to be fairly similar to what we did in the previous lecture. All the details will be the same, we're going to try it using this gradient descent library, implementing this cost function, implementing the derivative function but those cost functions and derivative functions will become more complex because we're using a more complex a more sophisticated model. So we're going to extend our implementation from the previous lecture to implement now this complete latent factor model. So similar to the model we had in the previous lecture where we just had an Alpha, Beta u, biased term for each user and Beta i, biased term for each item. We now also have those additional terms Gamma u and Gamma i as our fully fledged latent factor model as well as the additional terms in our regularizer. So it would be Lambda times the L2 norm of all of our terms, Beta u, Beta i, Gamma u, and Gamma i for each user new, item i and dimension k. So again just like before, our tasks are going to be to write down those gradient descent equations and try to convert those into Python code which is going to be fairly detailed to get right. Secondly, just using the gradient descent library, optimize the model but that part is exactly the same as in the previous lecture. All that's really different here are those costs and derivative functions. So we're going to start like we did before with a few definitions and utilities, we start by defining our offset term, Alpha, a bias terms for each user and a bias terms for each item and a few additional parts here. Now that we have latent factors so we have additional parameters corresponding to user Gamma for each user and the item Gamma for each item and all of those parameters are actually vectors of parameters. So if each user u, we have a vector of parameters saying, "What are the preference dimensions for that user u?" and for each item i, I have a vector of parameters saying, "What are the properties of those items?" We also set some dimensionality, so in this case we split number of latent factors to be able to fairly simple model. So again we need this function to convert from a flat parameter vector 12 parameters, Alpha, Beta and Gamma so the gradient ascent library we use like before is going to assume we have this continuous vector Theta which describes all of the parameters in our model. We have to extract from that contiguous vector the offset parameter, the bias term for each user, the bias on the each item, the preference vector to each user and the preference vector to each item. So again, this is fairly similar to the code we had before. It's going to iterate through all of the positions in Theta and say, "What parameters does that correspond to?" So the first value is going to be offset Alpha, the next set of values for the number of users are going to be able to use a biases followed by all the item biases, then we iterate through all users in all dimensions and extract the user preference for that preference dimension and then the item preference for that preference dimension. You can do this in a different order but this is a fairly simple way to convert a vector of parameters to Alpha, Beta, and Gamma. We have a few more utility functions for this new version of the model and we have to update our prediction function. The only real additionally utility function we need here is a function to compute the inner product between two vectors which we're going to use to compute the inner product between Gamma u and Gamma i, civil function implement because of course do the same thing using NumPy or a library if we'd like. So we're just iterating through all of the dimensions in x and y and multiplying the corresponding dimension of x with the corresponding dimension of y. A prediction function is mostly the same as before, we just take for a given user and a given item which is the writing we're trying to predict. We take the offset term Alpha plus the user bias plus the item bias and now I have the inner product between the user Gamma and the item Gamma for that user and item pair. A little bit more complicated, there's updated cost function which is just the mean squared error of our predictions in addition to the regularization components. So the mean squared error is very similar to before. We just use this new prediction function to make all of the models predictions. We print out the mean squared error just for debugging purposes and then we take all the terms in our model other than Alpha itself and we add the squared value of that parameter multiplied by Lambda to compute our regularizer and we return that cost. Now, we have a better derivative function which is much much more complicated. For the most part I think this is something you should look through on your own. The idea is the same as in the previous lecture where we're trying to iterate through all the points in our data set. So u Gamma i and then we're updating the corresponding derivative terms. So for each time we see an instance u Gamma i, we're going to update the offset time Alpha, the bias term for the corresponding user, the bias term for the corresponding item, and then the each preference dimension k, we'll update Gamma u k and Gamma i k using the derivative functions I gave previously. So there's a lot more code here, it's quite hard to follow. This is probably not something that I expect most people to be able to implement on their own but this is just to demonstrate what the actual process is for coming up with a working implementation of latent factor model using a gradient descent library. Then finally, we can run our model and we can observe its performance. So things look okay, we're getting lower and lower values for the mean squared error. I haven't tested in this simple exercise whether we're over-fitting or not, whether our training error is going to be much lower than our test error, that's of course something we could and should do if we'd like to implement this the correct way. So that's about it for our implementation of this latent factor model. On your own, you might try optimizing parts of this code for example by using NumPy to make it more efficient. You might also experiment with different regularization parameters or regularization strategies for Alpha and Beta and Gamma and maybe try actually using a training validation and test set to really validate that correctly. You might also try different values of this parameter k to the number of preference and property dimensions your model has to see what effect those actually have on the model's performance.