[MUSIC] So this notion of Variable Importance. So one of the weaknesses of ensemble methods that we mentioned are perhaps the interpretability of them. So the single decision tree makes a lot of sense. If I have a whole set of decision trees all sort of working in concert, it's a little less clear what's going on. And so if you're actually trying to use these models to gain some insight about the domain, that may be difficult with a random forest. All you get is sort of the answers out and they have high accuracy, but you don't really know why it works. Okay. So variable importance is a hedge against this. So the idea is that, as an output of running this algorithm, you'll get a measure of how important to the accuracy the various attributes were, why I'm using variable and attribute interchangeably. Okay. So how do you measure this? Well as you're training each tree, you do this permutation experiment, and we've seen this before. At least I think we have. So, you'll scramble the values of that attribute and what you'd expect is that once you scramble these, the accuracy, you know, the error rate, it may change. Okay. And if it's an important attribute then it'll change a lot, and if it's not a very important attribute then it doesn't change very much. And so that's the idea. So if you scramble this and the error rate does not change very much, then it must not have been an attribute that was heavily used in the decision tree. Okay. Actually you measure the error increase after permitting these values. And you can plot for all the attributes in the dataset which ones came out to be most important by activating these across all of the trees. Okay. So this helps with interpretability of the overall Random Forests, because you get a sense of which ones are really helping to make all the decisions. All right. And so in some applications, in particular a few places mentioned in medical applications, you do eventually get down to root causes rather than sorta doing fire and forget treatments. All right. Fine. So, one other difference is the gini coefficient is often used. And this actually come up, not just in Random Forests, it comes up in variations on decision tree algorithms. And so we talked about entropy as a measure of impurity when you're making a decision on each node. You need to decide which attribute to split on that'll give you the most reduction in uncertainty. And so you're looking for choices that'll give you very pure choices to the left and very pure choices to the right, where if I select female and male, then I get 100% of the records when I go down the female branch, all survive in the Titanic data set, okay. So somewhere down the decision tree this may turn out to be the case, in which case that would be a strong indication to use the gender attribute to make the split. Okay, and so we counted this measure of intuitive notion of impurity using entropy. Another calculation of the intuitive notion of impurity is the gini coefficient. Okay, and so this looks like this. And so if you think about what this is doing. Right, if you can think about how this behaves. We're not gonna show you to derive this and there's actually varying notions on how to compute the gini coefficient in general. You think about how this is behaving that if you have probability 0.5, then square that and it will be plus 0.5 on the other side. Say we have a binary class that's just survived and not survived. And this will be 0.25 + 0.25 which is 0.5. Okay. But, if you have. 0.1 squared + 0.9 squared, well that's going to be a lot smaller. Okay. So 1 minus that will be much bigger. Okay, and this makes sense because if 90% of the values are one value and only 10% of the other, then this had to be just more pure than something that is evenly split. And so you still got this function that's ranging from 0 to 1, capturing some notion of purity. And this has the same property as entropy did, okay. All right, so now you know both of those methods for measuring this impurity. And you'll see both of them in practice in decision trees in the libraries that implement them. [MUSIC]