Hi. In this module I'm going to start talking about multi voxel pattern analysis. So, let's return to the data processing pipeline that we've been using throughout this course and the previous one. We talked about data acquisition and reconstruction. We talked about pre-processing, experimental design. And we've talked about data analysis. And the data analysis we talked about human brain mapping and localizing brain activation, we talked about connectivity, functional and effective connectivity. And now we're going to start focusing on prediction and classification. And so there is a growing interest in using fMRI data to classify mental disorders and predict the early onset of disease. So in addition they're interested in developing methods to predicting stimuli directly from functional data. This opens the possibility of inferring information about subjective human experiences directly from brain activation patterns. Predicting brain states is challenging and requires the application of novel, statistical, and machine learning algorithms. Various techniques have been successfully been applied to fMRI data in which a classifier is discriminated between brain states and then being used to predict the brain states in a new set of fMRI data. When applied to fMRI data, the results also a pattern of weight across brain regions that can be applied prospectively to new brain activation maps to quantify the degree to which the pattern responds to a particular type of event. Here's a little cartoon image where we have a brain activation in a vector x, over the different voxels. So it' just x is just a vector of length v. And we have brain weights, w, also over the different voxels. And then we take the dot product of the two, and if this value's bigger than zero, we classify it to group A. If it's less than zero, we classify it to group B. So here the issue is to find the proper weights that will classify people appropriately into these two different groups. The application of machine learning methods to fMRI data is often referred to as multi-voxel pattern analysis or MVPA. Instead of focusing on single voxels, MVPA instead uses pattern-classification algorithms applied to multiple voxels simultaneously. So here's an example of what's going on here. We take the brain and we split it up into a number of different voxels. Now we show. Look at the brain activation that strung out in a vector of length v, where v is the number of voxels. And we look at the activation vector in this case when you're looking at one stimuli. In this case, a picture of a face. And when you're looking at another stimuli, say a picture of a house. Now the idea is that if we repeatedly show you pictures of faces and houses, we might be able to train a classifier that's able to discriminate between activation, looking at faces verses when looking at houses. So this is sort of the data that we often use, in BPA analysis. We have a vector of of activation values and we have some sort of outcome. In this case it's a binary outcome faces or houses. So why is the multi-varied approach useful here? Let's say we have activation over two different voxels. In Voxel 1 and Voxel 2. And let's say that here we measure activation over these two voxels, when the subjects are looking at faces and when they're looking at houses. And so we do each of these things six times. So you look at six faces and you look at six houses. So let's say that when you're looking at faces, you get these little purple dots here. And they correspond, they tend to have high activation of voxel 2 but low activation of voxel 1. In contrast, when you're looking at houses the activation is very high in voxel 1 but low in voxel 2. In this case it's very easy to separate which activation correspond to phases and which corresponds to houses just by looking at the voxels universely. Here there's a clear difference in the distribution in Voxel 1. So in Voxel 1 the distribution is higher for houses versus faces. And for Voxel 2 the opposite is true. So this case, it's very easy to sort of classify these different things based on Multivariate Analysis. However, let’s say that the pattern instead looked like this, where the values on the first and second voxel aren’t as clear-cut anymore. Here if we look at the marginal distributions of each voxel they’re overlapping for both faces and houses. So looking at each box tandem, having a high value of voxel two doesn't really tell us so much whether you're looking at a face or house and the same is true of voxel one. Instead we need to use information from both of the Voxels. And by making a scatterplot like the one seen here, it's very easy to separate faces and houses from each other by drawing a line like this. So that's where our classifier would go, like if you're below this line you're looking at a house, if you're above this, you're looking at face. But to do this we need to use multivarial analysis. It's not enough to look at each voxel in isolation. Sometimes a straight line isn't enough to classify them but we can still get a separation of them by say making it a non-linear boundary between them. Let's all talk about it as we move along. But in general by looking at the relationship between the two voxels, we can now kind of separate between phases and houses are different stimuli. In multi-voxel pattern analysis the goal is to determine the model perimeters that allow for the model's accurate prediction of new observations. So the idea is, how can you make a classifier that predicts very well in subsequent analysis? Whether you are looking at a face or a house in our little example. So here we seek to create rules that can be used to categorize new observations. In contrast to this, methods such as the GLM, which we've talked about a whole lot in the previous course, seeks to determine the model parameters that best fit the data at hand. So this is sort of a fundamentally kind of a different objective that we're working on in MVPA. So MVPA is a classifier, is defined as a function f(.) that takes the values of some observed features in our example, maybe it would be the activation over all the voxels. And predicts to which class the observation belongs. This could be whether you're looking at faces or houses. Whether you're a schizophrenic or a normal control. Whether you're at risk for early onset Alzheimer's or not. Here's throughout this let's denote that the set of features x be equal to x, and that's not just the vector of length B which say, for the sake of argument is just a vector of activation over V different voxels of the brain and let's let the class label be y. So, in the case of faces versus houses, y is equal to zero, when you're looking at faces. It's equal to one when you're looking at houses. It could also be a continuous measure, as well. And then, the goal is, here, to find the classifier f, so that we can take the value f of x, and predict the class label. So, get a y hat, which is a prediction of your class. So a classifier has a number of parameters w that needs to be estimated or in the machine learning nomenclature, learned. So the learning is typically performed on a subset of the observations called the training data set. The learned classifier models the relationship between the features and class labels in the training data set. And then once you trained it, the classifiers then evaluated using an independent data set which is called the test data. If the classifier truly captures the relationship between the features and the classes, it should be able to predict the class labels for data it hasn't seen before and that's the idea of holding out the test dataset and seeing whether it does a good job on data, it hasn't been previewed to before. And so, the accuracy of the classifier measures the fraction of observations in the test data for which the correct label was predicted. So here's an illustration. So here, again, we have features, which in this case are voxels. So we have measurements over v different voxels of the brain and that's in the vector x, and we have a class label which is y. So again, let's just say it's faces or houses. So now we have a bunch of data. We have a number of different observations. We can write it in this sort of matrix format. So each row in the data is a different observation of these V voxels and each element of the vector is a new class label corresponding to that observation. So this full data set is split into two parts, training and test data. So we can just split it and we take portion of the data and allocate that as training data and a portion allocated as test data set. So we take the training data and do further analysis on it while the test data set is put away not to be seen by the analyst. So using this training data set and we now have the test data set which is kind of outside of the learning and we take the test data and we put it aside as well as the true labels. Now what we do is using the training data set, we train the classifier and then we apply this classifier to the test data to get predicted labels. We can then compare these predicted labels with the true labels to see how well our classifier is working. And so, that's one of the principles being MVPA. We use these hold out sets in order to assess the accuracy of our classifier on independent data. Okay, so that's just a little bit of the feel of how to go about doing an MVPA analysis. In the next two modules, I'm going to walk you through the different steps in how to perform a simple MVP analysis. Okay, I'll see you then. Bye.