In this video you'll learn how to test whether the predictors together as a set are significantly related to the response variable. This test is referred to as the overall F-test of a multiple regression model. Consider the example where we predicted popularity of cat videos, measured as number of page views, with a predictors cat age and hairiness. Rated on a scale from zero to ten. An overall test helps us decide whether cat age and hairiness taken together are related to video popularity in the population. As always, we start by specifying the null hypothesis. If there's no relation between the predictors and the response variable, this means that neither cat age nor hairiness helps to predict popularity. In other words, the regression coefficient for both these predictors will be 0. We can visualize this as a flat plane in a three dimensional graph. The alternative hypothesis is that at least on of the predictors is related to the response variable. If CAD age, hairiness or both are related to video popularity, the plane will no longer be flat. In other words, at least one, several or all of the regression coefficients will differ from 0. This alternative hypothesis is very general. If there is a relation between the set of predictors and the response variable, we still don't know which predictors contribute. To find out which predictors contribute, we'll follow-up with individual tests of the regression coefficient later on. But for now we'll focus on the overall test. The overall test like always is associated with a number of assumptions that need to met in order for the test to give valid results. These assumptions are linearity of each predictor and the response variable. For each value of the other predictors, and normality, homoscedasticity, and independence of the residuals. Another, more technical requirement is that you need enough observations relative to the number of predictors. I'll discuss the assumptions and how to check them later on. First let's see how to perform the test. To compute the test statistic F, we take the regression and error sums of squares we saw earlier when we calculated R squared. We turn these sums of squares in to variances and we divide them. First we divide the regression sum of squares, the variation in the response variable captured by our model by k- 1. K is the number of parameters in the model, which equals the number of predictors plus 1 for the intercept. The regression sum of squares divided by k- 1 is called the regression mean square. Now, don't be confused by this new term mean square, it's just another word for variance. It's the variance in the response variable captured by our model. To get the F value, the regression mean square is divided by the residual sum of squares divided by the number of observations n minus k, the number of parameters. The residual sum of squares represents the variation in the response variable not accounted for by our model. If we divide it by n- k, we turn it into the residual mean square or mean square error, often abbreviated MSE. Of course this mean square error is no longer the variation but the variance of the residuals. The variance and the response variable that we failed to capture with our model. So the F-test statistic is the explained variance divided by the error variance. Here's an example of an F distribution. As you can see the lowest possible value is 0. Which occurs when the regression mean square equals 0. When our model captures none of the variation in the response variable. As our model captures more of the variation, the F value goes up. The exact shape of the distribution is determined by two separate degrees of freedom. The first equals the number of parameters in the model minus 1. The second equals the number of observations minus the number of model parameters. Again, the number of model parameters equals the number of predictors plus one for the intercept. Notice that we use these values to turn sums of squares into mean squares earlier. This is why the first degree of freedom is often referred to as the numerator, or regression degree of freedom. And the second is often called the denominator or error degree of freedom. Once we've calculated the s statistic and the degrees of freedom, we can calculate or look up the associated p value. We don't have to worry about choosing the left tail or the right tail here. This is because the alternative hypothesis is non directional. It only specifies that one or more predictors are related to the response variable, but not which ones and not in which direction. This means we always look in the right tale to obtain the probability of finding the calculated F value or a more extreme value. Supposing our example we find a regression sum squares of 50.5 and a residual sum of squares of 18.3. The regression means square than equals 25.25 since we divided by 2. The mean square error equals 18.3 dived by 5-3 is 2 = 9.15. This gives us an F value of 2.76, the P value calculated with statistical software equal 0.266. If we use the table to look up the critical F value we see that our calculated value 2.76 does not exceed the critical value of 19.000. This means that we cannot reject the null hypothesis and can not include that cat age or hairiness or both are related to video popularity.