[MUSIC] In the previous section, we have already pointed out that even if we establish or if we define proper statistical and economic framework, it's always difficult to assign. Or to establish a report or relationship between the indicators, or the partial indicators we are defining for the composite index and the variables we need to choose. Here we are going to develop a bit more this idea of relationship of the variables with respect to the model. Variables, or the choice of variables, is always difficult because in many cases there is a mismatch between the date available and the variables we need to introduce in our models. So we do can imagine many different situations, sort of many different possibilities as we ride down in the next slide. So we might have different data sources for the same indicator. We may have, as you realized before in the previous section, set indicators that do not have a direct relationship with only one variables. And then we have a of different problems with the data. We have data with measurement errors. And we have, for example, problems of missing data. In all these situations, I would like you to remind that it's necessary in our work to look for reliable sources of data. As we pointed out in previous sections, we would like to emphasize that it's interesting to rely in data that are coming from statistical offices. This data do have usually the stand up benchmarks of quality and they are very useful and available. Given this, let me just focus my attention in one of the most important problems we have for when dealing with this type of data. It is what we call the missing data problem. Sometimes we find out a variable that approximates increasingly well a partial indicator. However, this data that can be observed a long time, or across individuals, is not fully observed. At some point some observations are missing, why? This happens usually, for example, in salaries. Some individuals do not or are reluctant to answer to some questions. And other individuals just lie because of some private reasons. And in these cases we find ourselves with a variable with observable data where some of the cases are missing. What to do in this case? Just to tell you the truth, we know there exist several methodologies to deal with missing data. But probably, mainly because of that, mainly because there are many different methodologies, there is not only one answer or one proper answer to this problem. Related to this missing problem or missing data problem. In the next slide you will find out a broad classification of the techniques devoted to avoid these problems or to solve this problem of missing data. Basically, we will distinguish among three different techniques or broad techniques. What we call case deletion, then single imputation, and then multiple imputation. Case deletion is basically to drop or to assign a zero value to the missing observation or to the missing value. Single imputation consists of constructing a predictive model based on the missing data and based on the data by level that is going to assign values according to different techniques that we will develop in further slides. And then finally there is what we call the multiple imputation. Which consists in deriving different samples from the same predicted distribution based on the missing data. And then takes some moments of this several generated samples to substitute and to estimate these moments and these moment will substitute the missing observations. So basically, within the framework of single imputation we are going to distinguish between implicit modeling and explicit modeling. What is the difference between implicit and explicit modeling? In the implicit modeling case, [COUGH] the predicted distribution appears implicitly in the data, is not expressed, it does not appear, it is not reflected by the researcher. Whereas in the explicit modeling approach, the predicted distribution is explicitly defined. So this is a very important difference in terms of assigning the missing data. Within the implicit modeling framework, we distinguish, as you will see in the slide, three different types of techniques. What we call the hot deck, the substitution, and the cold deck. The hot deck technique means, basically, that the missing observation is replaced by another observation that is already part of the sample. In the substitution techniques what we have is the sample of that contains missing observations is only used partially. That is there is a part of the sample that is not used. In this case the missing observation is replaced by an individual or by an element of this part of the sample that is not contained in the sample. And finally, the cold deck technique is when you substitute the missing value that you have in your variable by another value of a variable that is not in the sample. It's taken from another study or is taken from another sample. In the explicit modelling approach, we will distinguish between basically three different groups of techniques. One of them is what we call the unconditional imputation. The second one is what we call the regression imputation. And then the third one is that we call the expectation maximization. The unconditional imputation techniques consist basically in estimating with the data some moment. For example, the average, the standard deviation, some moment, some empirical moment from the data. And then the missing value is going to be replaced by one of these moments. For example, it's possible, in many cases it is done, that you replace the missing value by the expected value of this variable, for example. In the case of the regression imputation, what happens basically is that missing observations are explained in a regression framework through other explanatory variables. Then using the information, or the data that are contained in the sample, what we have is, we estimate by some statistical technique the relationship between this missing observation variable and explanatory variable. And we substitute the missing value by the predicted variables of this regression. And finally, the expectation maximization basically consists in assigning this missing observation as an unknown parameters and then in a maximum likelihood framework try to estimate these missing observations as parameters. Well, finally, in the last slide of this part of the course, we described the properties and characteristics of multiple imputation techniques. You have got there enough information to learn about this technique. But let me just point out that for this technique, what we basically have is a predicted distribution that is able to withdraw or to generate several samples based on the missing values and the rest of the data. Then with these several samples we are able to compute moments, and these moments are going to replace the missing values. In the last times, it has been very fashionable and very common to use what we call the Markov Chain Monte Carlo Methods in order to compute and to evaluate these missing observations. [MUSIC]