Single trees are typically not competitive in terms of

predictive accuracy compared to many alternative predictive models.

In this video we introduce the general idea of bagging,

boosting and random forests.

Know that bagging and boosting are general idea can

be applied to many other particular models as well.

When applied to trees,

the basic idea is to grow multiple trees which are

then combined to yield a single prediction.

Combining a large number of trees can also result in dramatic improvements

of prediction accuracy at the expense of some loss of interpretation.

Bagging, boosting, and random forests are all straightforward to use in software tools.

Bagging is a general- purpose procedure for reducing the variance of a predictive model.

It is frequently used in the context of trees.

Classical statistics suggest that averaging a set of observations reduces variance.

For example for a set of any independent observations,

each with variance σ²,

the variance of the mean of the observations is given by σ²/n.

Therefore, to reduce the variance of our prediction,

we can simply average these predictions for multiple datasets.

While this is a straightforward idea it is not practical

since we generally do not have access to multiple training sets.

To resolve this issue,

we take repeated examples from the training set.

This is usually called a bootstrap.

We can then train the model on each bootstrap sample on average as a result.

This procedure is what we call bagging.

We can also cross-validate using data not in the bootstrap samples.

Bagging can sometimes significantly improve the predictive performance of trees.

Note that from our description here that bagging is a general idea,

and cain principle be applied to

many other predictive models such as linear regression and logistic regression.

Random Forest is similar to bagging.

It builds on the idea of bagging but try to reduce

the correlation among models build with different samples.

In the context of trees,

this is achieved by building trees on random subsets of the predictors.

That is when building trees we do not use all predictors.

Since the trees are not built on the same set of predictors,

they are literally less correlated and

therefore reduces the variance of the average prediction.

The last idea we discuss is boosting.

Again, this is a general idea that can be applied to many predictive models.

We briefly discuss it in the context of trees.

Similar to bagging, boosting trees are built on bootstrap data sets.

However, the boosted trees are grown sequentially.

Given the current model,

we fit a tree to the error from the model.

Combining the trees can potentially improve predictive performance.

Here, we show the cross-validation results for the Bodor housing data.

Since this is a regression problem where the target available list price is continuous,

report total sum of squared errors on validation data.

With 34:60:40 split for the full trees SSE is 64013420.

Pruning the tree slightly reduces error.

Building the random forest further reduces the error.

More significant improvement is achieved by bagging and boosting with

the lowest cross-validation error reported for boosting which stands at 53251639.

This number is almost 20% lower than the one from the full tree.

I can hear that single trees are known to perform poorly in general.

Therefore, for practical predictive modeling,

we almost always use advanced techniques such as boosting and bagging,

random forest that we discussed here.

Let me briefly discuss the strengths and weaknesses of tree models.

In terms of strengths,

trees are easy to understand are versatile.

They can be used for both classification and regression.

We do not need to worry about variable selection as

a linear regression says variable selection is automatic.

There is also no need for variable transformation which is yet another advantage.

It is also known to be robust to outliers and missing value,

but the trees also suffer from several weaknesses.

First, they are only appropriated for large data-sets.

Second, says trees cannot capture in the accurate amount different predictor variables.

They are not good at a moderate linear relationships so they are prone to overfilling.

Therefore, it's important to cross-validate and prune.

Family trees do not produce rigorous insights like linear or logistic regression models.

Overall, trees are powerful as

a predictive model and should be part of a modelers tool box.