[MUSIC] So we talked about boosting and bagging and ensemble methods and drawing bootstrap samples. And so some of these ideas we can put together and apply them to decision trees in order to create an ensemble method based on decision trees. And once this ensemble method is Random Forest due to in 2001, and so this one is really, really popular and really, really powerful. Okay, so it's a good one to know. So, k times your going to repeat the following procedure, okay. Where k is a parameter that you'll specify. You'll draw a bootstrap sample from the data set and then train a decision tree on that bootstrap sample as follows. So until the tree is some maximum size that is usually set pretty conservatively, you'll choose a leaf node, iterate over all the leaf nodes. And for each leaf node, select m attributes at random out of the p that are available. So you're not gonna use all, you're not gonna consider all the attributes when you're deciding how to spit a node. You're just gonna select m at random and consider among those. And then pick the best attribute, or split of the attribute, as usual using entropy or other methods and we'll talk about one more, okay? So now you have a tree that only involves m of the attributes and was derived from only part of the data set. Now you measure the out-of-bag error. So what's the out-of-bag error? Well, when you draw a bootstrap sample remember that you might get duplicates, right. So you have a dataset of size n, you're gonna draw a bootstrap sample also size n, but whenever you get a duplicate that means that some other data item was left out of the sample, okay. So take all those data items that were left out of the sample and that becomes your test set, all right? So now evaluate the error on that test set, and use this for, you can use this for various purposes, okay. You don't necessarily need to use it to, make decisions as part of the algorithm but it's computed on the fly as part of the algorithm for various other purposes, okay? So, you can do this to estimate the strength which is the inverse of the error rate. You can do this to measure the correlation between random trees. Right, so each one of these is sort of a different tree using potentially different attributes and a different subset of the data. But you can imagine that just by chance, you might get a bunch of trees that all do the exact same thing and make all the same decisions, right. This would lower the power, lower the strength of the overall ensemble classifier. And so you don't want that. And so this gives you a running measure of how correlated things are, okay? And it also gives you a measure of variable importance which we'll talk about in the next slide. Okay so finally at the end of this you have k trees that are hopefully more or less independent, and now you can classify data points coming in by just a majority vote among them. [MUSIC]