The previous lesson was full of new concepts.

They, however, give a good sense of how diverse and flexible hybridization methods are.

It is time to get to more practical aspects of hybridization.

Let's look into the simplest ensemble model based on the weighted scheme.

Let A-hat be the approximation of our original utility matrix A.

By the definition of the ensemble,

it is expressed as a weighted sum of the outputs of the S distinct recommendation models.

Omega-k here are the ensemble weights.

So how to determine these weights?

How do make your ensemble out perform any of its models individually?

You have different choices here.

In the simplest case, you can choose all the weights to be

uniform and equal to one over S. In practice,

it is possible to get an improvement even with these values.

You can also try to incorporate some domain or expert knowledge and based on that,

figure out what the reasonable values are.

For example, if it is known that the original matrix is extremely sparse,

you can manually lower the weights of the collaborative filtering models as less

reliable and increase the weights of the quantum-based models as more consistent.

The next important question is,

how do you actually verify that the result is improved with the selected weight values?

You can use your training data and evaluate the model on the small subset of it.

It means that you need to speed your training data set.

You randomly select 25% of the node observations and one of them as a holdout set.

I denote it by H. The remaining

75% of the observations are used to actually train your model.

Once complete, you evaluate

the ensemble-based prediction against the actual holdout values.

Depending on the problem,

the evaluation measure for this task may be right.

In the simplest case of the raging prediction task,

you can go with MAE or MSE.

When you are satisfied with the weight values,

it is important to retrain your model on the entire training set not just 75% of it.

More formal techniques can

also be applied to find more accurate values of ensemble weights.

For example, you could do a grid search.

Another option is a linear regression that fits MSE matrix.

This matrix, however, is sensitive to noisy outliers.

A better approach would be to use a gradient-based methods or

the MAE or any other variation of the robust regression methods.

Let's focus on the gradient method as you're already familiar with it.

As always, you first need to take care of the gradient of your Lewis function.

The partial derivatives of the MAE with respect to the ensemble weights are

easy to find as they are proportional to the sine function.

Note, it is also might be a good idea to add a regularization term.

It could be for example,

the Euclidean norm of the vector Omega.

As now, you already know how the full gradient looks like,

everything is ready to sketch the optimization algorithm.

At the starting point of the gradient method,

the weights should be initialized with a uniform value in

order to avoid favoring any particular ensemble model.

The other iterations continue until the error

stops decreasing or maximum number of iterations is reached.

As the problem is non-complex,

the algorithm is only guaranteed to converge to local minima.

And again, once the new weight values are computed,

don't forget to retrain the model on the entire training data.

There is another important class of

the hybridization methods namely randomness injection.

It has many principles in common with the famous random forest classification.

The key idea of the method is simply to bring randomness into the model behavior.

Then, by incorporating several randomized models,

one can achieve greater quality.

Two notable classes of the models suitable for

this method are neighborhood-based models and matrix factorizations.

Randomness in the neighborhood models is achieved by

a considerable extension of

the neighbor source space in which k neighbors are selected randomly.

Randomness in the matrix factorization methods directly

falls from the random initialization.

As a side note,

randomness plays an important role in many machine learning algorithms.

For instance, random projections are used to quickly approximate kernel functions.

And more relevant to our course example is the randomized SVG algorithm.

Unlike SVG++, this is the true singular value decomposition.

Unlike SVG though, it is computed by

random projections rather than by the Lancer's algorithm.

On the other hand, it operates with matrix vector multiplications similarly to the SVG,

which makes it very convenient and computationally efficient.

Time to get back to our hybrid recommenders.

Instead of building an ensemble of

several different models computed for a single data set,

you could also perform a vice versa task.

Build an ensemble of a single model

computed over several subsamples of the original data.

The latter technique is called Bagging.

There are four different bagging methods that could be

adapted for collaborative filtering models.

Without going into details,

it is important to emphasize that in these methods,

you are required to build a number of subsamples of the original data.

And therefore, additional storage space is needed.

In the first three of these methods,

the subsamples are similar to the original data in terms of the number of elements,

which makes the applicability of this methods in the large scale settings questionable.

Only the last method, Entry-wise subsampling,

a lost to generate samples with the lower number

of non-zero elements than in the original data.

In this method, a collaborative filtering model is applied to

each subsample and the final prediction is simply average across all them.

The first three methods follow similar ideas

with some nuances in how the averages are computed.

As an important remark,

the equation of the balance between the quality of recommendations and

the computation of feasibility of a particular approach is not easy.

There is often a trade off between them.

This was a high-level description of the basic hybridization methods.

And I encourage you to check the materials referenced at

the bottom of the slide to get more details and see other examples.

Let me summarize the lesson.

You understand how weighted hybridization is performed.

You know at least several techniques for tuning ensemble weights.

You can justify a certain choice of the weight values in your model.

And you are familiar with randomization techniques.