[SOUND] While we now have evidence that depression is significantly associated with the number of nicotine dependant symptoms endorsed young daily adult smokers our sample. Another likely predictor of nicotine dependent symptoms is, of course, the number of cigarettes a person smokes each day. >> What if number of cigarettes is associated with both our explanatory variable, major depression, and response variable, nicotine dependence symptoms? What if it is really smoking, rather than major depression that is associated with number of nicotine dependent symptoms? >> To evaluate whether this is true. I add number of cigarettes smoked per day to my model. Before doing this though, I will make sure that my categorical explanatory variables have one group that's coded zero. And I will center my quantitative variable. My major depression variable is already coded one equals depression. And zero equals no depression. However, my quantitative number of cigarettes smoked variable ranges from one to 98. Because zero is not a valid value for this variable, I should center it by subtracting the mean number of cigarettes smoked from the actual value from each observation. Going back to the Python program for this example. We add this code to center the number of cigarettes smoked variable. First, we create a new variable called numbercigsmoked_c in our sub1 data frame. In brackets, we typed the new variable name numbercigmoked_c enclosed in quotes. After an equal sign, we put in parenthesis the name of the original variable. Sub1, numbercigsmoked, minus the mean of numbercigsmoked. We get the mean by adding .mean and an open and closed paren after the name of the variable. We can check to see if the variable is properly centered by asking Python to print the mean of our centered variable using the following code. We can see that the mean is equal to zero and then thirteen zeros one eight three seven, which essentially centers the variable at zero. Now we can go back to our regression model for the association between depression and number of nicotine dependent symptoms. And add our centered number of cigarette smoked variable. Here is the output. Examine the p values and parameter estimates for each predictor variable. IE our explanatory variable, depression, and our potential confounder, number of cigarettes smoked. As you can see both P values are less than 0.05. And both of the parameter estimates are positive. Indicating that having major depression, and smoking more cigarettes is associated with having a greater number of nicotine dependent symptoms. >> Thus, we can conclude that both major depression and number of cigarettes smoked are significantly associated with number of nicotine dependent symptoms. After partialing out the part of the association that can be accounted for by the other. In other words, depression is positively associated with number of nicotine-dependent symptoms after controlling for number of cigarettes smoked. And, number of cigarettes smoked is positively associated with number of nicotine-dependent symptoms after controlling for the presence, or absence of depression. Note, that if a parameter estimate is negative and the P value is significant. It would mean that there was a negative relationship between that variable and the response variable. >> Suppose we started with a different explanatory variable. Dysthymia is pervasive, low level depression, that last a long time, often a few years. Suppose we wanted to test the linear relationship between Dysthymia. A binary, categorical, explanatory variable, and number of nicotine dependent symptoms, a quantitative response variable. >> You can see from the significant P value and positive parameter estimate that dysthymia is positively associated with number of nicotine dependence symptoms. That is, the presence of dysthymia is associated with a larger number of nicotine dependant symptoms. And the absence of dysthymia is associated with a smaller number of nicotine dependent symptoms. While dysthymia is long lasting low level depression, major depression is a disorder characterized by a discrete episode of severe depression. So what happens when we control for major depression in this model? As you can see Dysthymia is no longer significantly associated with the number of nicotine dependent symptoms, after controlling for major depression. Here, we have an example of confounding. We would say that major depression confounds the relationship between dysthymia and number of nicotine dependent symptoms. Because the p value for dysthymia is no longer significant when major depression is included in the model. As in the previous example, using multiple regression, we can continue to add variables to this model in order to evaluate multiple predictors of our quantitative response variable number of nicotine dependent symptoms. Here we can see that when evaluating the independent association among several predictor variables and nicotine dependent symptoms major depression and number of cigarettes smoked. Are positively and significantly associated with number of nicotine dependent symptoms. While Dysthymia, age, and gender are not. >> Know also that we've centered our quantitative age variable by subtracting the mean age from the actual age for each observation following the same procedure we used to center our number of cigarettes smoked explanatory variable.