We have discussed many details about the SVD.
Now, it's time to find out how to actually compute PureSVD model with it.
One way would be to find a complete SVD of
a dense matrix and then truncate it to a smaller size.
This however, is a heavy computational task,
impractical even for modest sized datasets.
Instead, you can use another technique based on the efficient Lanczos method.
The algorithm is iterative,
and the only thing it requires is the rule,
how to multiply your matrix by an arbitrary dense vector,
and this can be done really fast.
For example, with the help of
efficient sparse matrix formats such as compressed sparse row storage, or CSR.
And the overall complexity for this approach is linear in the number of
non-zero values of the original matrix and linear with respect to matrix dimensions.
And you don't have to write this algorithm yourself.
Its highly optimized implementations are
available out of the box in many programming languages,
like MATLAB or Python.
There is an implementation of the truncated SVD in Spark as well.
However, its current version doesn't support custom matrix vector multiplication rules.
Let me show you why this feature is also important.
Recall that in order to deal with the unknown values,
we have replaced them all with zeros.
You probably might think that this imputation technique is too restrictive and,
at least in some cases,
it could be more beneficial to use a different strategy,
for example, with average values instead of zeros,
as I have mentioned previously.
One common misconception is that in this case,
the utility matrix becomes dense and impractical to work with.
In practice, however, this is not a problem at all and you can easily avoid it,
thanks again to the Lanczos method.
Let's see it on the following example with average user ratings.
You have your sparse utility matrix, A.
First, let's compute its average values for every row,
taking into account only its known elements.
This produces a dense vector. Let's call it b.
Now, your new matrix with the average values in place of
the unknowns can be represented by the sum of two other matrices.
The first one is just a dense matrix of the average values repeated across the columns.
It can be constructed by another product of your vector, b,
with a vector of all ones,
denoted by e. The second matrix, let's call it D,
has the same sparsity pattern as the matrix A,
and it's non-zero values are equal to the difference between
the non-zero values of A and the corresponding values of b,
as shown in the formula below.
When you sum up these two matrices,
you get your new matrix A-hat.
Now, I hope you remember that for the Lanczos method,
all you need to know is how to multiply your matrix by an arbitrary vector b.
With a little bit of linear algebra,
you can show that your matrix vector product consists of two terms.
First one, D multiplied by v,
has the same computational complexity as the product of
your original matrix A with this vector v. It doesn't add anything new.
The second term involves only vector multiplications and
its complexity is proportional to the number of items.
In a similar way,
you can show that the added complexity for the multiplication
with the transposed matrix is proportional to the number of users.
The overall computational complexity is then increased by a small increment,
linear with respect to the dimensions of the original matrix.
As you can see, this procedure avoids formation of a dense matrix along the way,
and it's almost as efficient as PureSVD with zero imputation strategy.
Similar trick can be applied to column averages or whatever combination of
row and column averages you like.
It is also easy to implement.
Here, you can see the code for the previous example written in Python.
The left block of the code is just a computation of matrix D and the vector b.
On the right hand side,
the key part is the matrix vector multiplication,
which is handled by an instance of
a special linear operator class provided by a psi-phi package.
This instance is needed to imitate
the behavior of your new matrix when multiplied by a vector.
It avoids forming any dense matrices.
This operator is then used as an input to the standard SVD routing,
instead of the matrix itself.
This is it. I hope you feel the beauty and the power of linear algebra.
If you have a single machine with enough RAM to fit your data,
this can be a great way to build your recommendation model.
As long as you have your data in the appropriate sparse format,
the computation is very straightforward and blazingly fast.
If you don't need to implement any fancy imputation strategy,
and often times it is not required,
and especially if your data does not fit a single machine memory,
you can also use Spark and its distributed implementation of the truncated SVD.
To summarize this lesson,
now you know how to make imputation of
the PureSVD model very efficient with the Lanczos method.
You also know how to use
various missing value imputation strategies and avoid constructing dense matrices.
Due to its simplicity,
PureSVD is suitable for a quick prototyping of your recommendation models,
especially if you have enough RAM to run it on a single machine.
Distributed implementations of the truncated SVD is
a difficult task and it's currently supports only the core functionality.