[MUSIC] Statistical interaction describes a relationship between two variables that is dependant upon, or moderated by, a third variable. >> For instance, do you prefer ketchup or soy sauce? Obviously, your answer depends on what food you're eating. If you're eating sushi, you probably prefer soy sauce. If you're having a burger and fries, probably gonna want ketchup. >> In this case, the third variable is referred to as the moderating variable or simply the moderator. The effect of a moderating variable is often characterized statistically as an interaction. That is a third variable that affects the direction and or strength of the relation between your explanatory, or x variable, and your response, or y variable. What if the population we're studying has different subgroups? Could it be that, like the soy sauce, ketchup example, different subgroups could have a moderating effect on our association of interest? >> To explore this idea, we are going to use a hypothetical study and some made up data. In our imaginary study, we're looking at two diets and their effects on weight loss. Diet A is a low carbohydrate plan. DIet B is a low fat plan. Our hypothetical study also recorded data on which exercise program participants chose, cardiovascular exercise or weight training. >> Our variables of interest are diet and weight loss. We've added this third variable, exercise plan, to help us understand moderation or statistical interaction. So what's the association between diet plan A and B, our explanatory variable, and weight loss, our quantitative response variable? This table shows our hypothetical data, showing diet, weight loss, and exercise plan. Since we have a categorical explanatory variable, diet plan A or B, and a quantitative response variable, that is weight loss, we will of course need to use Analysis of Variance to evaluate the association. >> This model python syntax should look familiar to you, where I name my model, include the equal sign, and the ols function from the statsmodel's formula API library. Within parenthesis I write my formula including the name of my quantitative response variable followed by a tilde, and then the name of my categorical explanatory variable. And I indicate to python that this is a categorical variable by adding a capital C and putting the variable name within parenthesis. Then I print the model using the summary function. In this diet and exercise example the syntax will look like this, the resulting output from my analysis is shown here. As you can see, weight loss is our response, or dependent variable. There are 40 observations in the data set. The F value is 12.00 and it's associated with a significant P value, that is a P value less than 0.05. Well this tells us that there is an association between diet type and weight loss. To understand that association we need to look at output using the group by function. Here, I create a new data frame with the variables of interest, and request means and standard deviations for Weightloss by type of diet. As you can see, the average one month weight loss for diet A is about 14.7 pounds. And the average one month weight loss for diet B is about 9.3 pounds. So in conjunction with the significant p value, we can say that diet plan A is associated with significantly greater weight loss than diet plan B. Here we show the finding graphically, as a bar chart with diet, the explanatory variable, on the x-axis. And the mean weight loss, our response variable on the y-axis. >> What about our third variable, exercise program? Would we get the same results in terms of the association between diet and weight loss for those participants using cardio and those participants using weight training?