Hi, in this module and the next, we'll talk about the steps required to perform MVPA analysis. So, the process of performing MVPA follows a series of steps. First, we have to define features and classes, next we have to perform selection, then we have to choose a classifier, train and test the classifier, and then finally, examine the results. So, in the next two modules, we're just going to walk through each of these steps. When defining features, here features are variables that are used as predictors in your model. There are many possible choices of what information could be used as features in an MVPA analysis. This could be the raw fMRI data over both space and time. It could be the averaged fMRI data over a block, it could be the beta values from a GLM analysis, which corresponds to the activation, or it could be the average of several voxels in a region of interest. So, it sort of depends on what your goals are. The choice of which outcomes to use depends upon the research question. There's two basic types of outcomes that we often use in MVPA analysis. The first is categorical outcomes, and these are predicted by classifier systems. Here this might be stimulus class, say whether we're looking at faces or houses, or it could be the subject response or decision, whether you decide to purchase an item in a economic task or not. The other basic type is continuous outcomes and these are predicted by regression models, this could be your emotional ratings to a certain task, it could be your age or it could be these are criteria scores, scale items used in the clinical diagnosis so either these can be used to, as an outcome in your MVPA analysis and that sort of changes the classifier that you use and whatnot depending on which type you use. So, in fMRI data, the number of features is typically many times larger than the number of observations, because often times we might use all the voxels of the brain as features, and we very rarely have that many observations, we never have that many observations. And hence, its usually beneficial to reduce the number of features through some sort of feature selection. This could involve using only voxels from a particular regions of interest doing dimension reduction techniques such as SVD or PCA, as we talked about in previous modules or doing some sort of meta-analysis to choose what voxels are the most important. Note that, it's not permissible to select voxels that appear to distinguish between classes using information from the entire data set. So, you can't look at the entire data set and see which values distinguish between, say, phases and houses and use those feature selection, cause that will, in this case, information in the test data set may affect the learning of the classifier, and then bias subsequent accuracy measures. So again, you're going to get much more accurate results than you really should have gotten. So, you're going to get a bias now. There are many types of classifiers that you may choose to use and that vary in the kinds of statistical relationships that are ab le to detect. Often times, we discriminate between linear and non-linear classifiers. Here's an example of a linear classifier where we have red and blue dots here and a straight line is able to discriminate between the two. In this case, we might instead use a nonlinear classifier which is here, a wiggly curve that now discriminates against the two classes. So, a linear classier is often written in the format w transpose x plus b, and if it's bigger than 0 you might belong to one class, and if it's less than 0 you might belong to another class. Here, the w is a series of weights, that if x is just a measurement over a number of voxels, w is the weights correspondent to those voxels. So, linear classifiers tend to have very nice interpretations in MVPA analysis. So, mathematically, in V-dimensions, if we have a V-dimensional matrix x. And this defines a V-1 dimensional hyperplane. So, if we have two dimensional then this is a straight line, and three dimensionals is a plane, and V dimensions is a V-dimensional hyperplane, that's what it's called in the mathematical construct. So, w is now a V-dimensional vector of weights and b is the threshold value, so it's just simply a scalar. So, the inner product is 0 when vectors are orthogonal to each other. So, the equation w transpose x = 0 defines a line orthogonal to w. Okay. And so, here we see an example of that. Here, we have a simple example where w is just equal to [1 1]. It's a two-dimensional, since v = 2 here. The classifier is just a straight line here and so w is equal to 1 and 1 and b is equal .5 so that defines this line here. Okay? And so, it's orthogonal to w here. There exists many types of linear classifiers, so different types of way of getting values for w that separate between the classes of dots. And some examples include Logistic Regression, Gaussian Naive Bayes, Fisher's Linear Discriminant Analysis, Linear Support Vector Machines and Classification trees. So, let's focus on support vector machines. And so, support vector machines are designed to maximize the margin around the separating hyperplane. So, if there's no points near the decision surface, then there's no uncertain classification decision. That's the idea. Here. So, basically what we want to do is in this two dimensional problem that we see at the bottom here, we want to find a line. But there's many possible lines that we could choose that separates between them. But what the support vector machine does is it chooses the line, so that these two red and blue lines are. Are as far apart from each other as possible so you maximize the margin around the separating hyperplane. So, we choose our line like this such that we maximize this distance between them so it's sort of like a zone between the two dots that is as big as possible. So, how do we find this line that maximizes the separation between these two classes? Well, it involves solving the following convex optimization problem. And this can be solved as a quadratic programming problem. And this is implemented in many common software programs, such as MATLab and R, or whatever it is you wind up using. So, datasets that can be separated perfectly by a linear boundary are said to be linearly separable. So, this is an example of a linearly separable problem. So, all the red dots are on one side. All the blue dots are on the other side. However, in this example, these two clouds of points are not linearly separable because you see the blue dot. There's not straight line in which we can separate. That the red and blue from each other perfectly, and this is often going to be the case in real life situations. So, when the data is not linearly separable we may still use a linear classifier by allowing certain data points to be on the wrong side of the boundary. However by being on the wrong side of the boundary they incur a penalty that increases with their distance from the boundary. And so, to implement this, we had to do something called a slack variable to allow misclassification of difficult or noisy observations. So, here you see, in the support vector machines, we allow certain dots to lie the wrong side of the boundary, but they incur a penalty in the function that we're trying to minimize. And so, by doing this you allow certain misclassifications while still being able to fit this linear classifier. The data sets that are linearly separable tend to be easier to work with. If the data isn't linearly separable,we can always introduce slack variables as mentioned above Another option is to map the data on to a higher-dimensional space where the training set is separable with a linear classifier. So, here's a neat example. So, here we have a data set that's linearly separable. So, we can find a point that separates the red and the blue lines. And, and that would be the solution there. However, this data set isn't linearly [INAUDIBLE]. We can't find a line on the axis, or a point on the axis where we separate these points from each other. However, if we take the value of X, and we square it. We can now plot it in two dimensions as follows. We plot x in the x-axis and x squared in the y-axis. Now suddenly, we have this cloud of points, and now we can define a linear classifier that perfectly separates the two classes. So, by making the problem a little bit more difficult, by increasing the dimension, we can now perfectly separate the two classes. Here's another example in two dimensions, where we have the red points in the circle and the blue points outside of it. We can't fit this using a linear classifier, but if we increase the dimension here by taking a transformation now to find a plain that separates these points. Okay, so that's the end of this module. I started talking about some of the basic steps in DPA analysis. We talked about defining features. And defining outcomes. We talked about feature selection, and we talked about different classifiers. In the next module I'll continue talking about how to perform an MVPA analysis. I'll see you then, bye.