So, now let's test for moderation within the context of our final inferential test, the correlation coefficient. You might remember this scatter plot and correlation based on the gap minder data between rate of urban dwellers in each country and the Internet use rate. We found that this was a significant association with correlation of 0.61. But might this relationship, this correlation between urban rate and Internet use rate differ based on countries with different income levels. >> To explore this question, we create a third variable that is categorical. For this new variable, the income-per-person variable, which is quantitative, will be categorized as high income countries, middle income countries, and low income countries. The adjustments we made to our program are very similar to the adjustments we made to our Nova syntax, and to our chi square syntax when testing moderation. We'll start our program calling in our libraries and loading the gap minder data. Next, we set our three variables of interest to numeric, and set blank data on our third variable to NAN. Then I create a new data frame I am calling data_clean that drops all missing, that is NAN values for each of the variables in the data set. Now, I create my income group variable which splits the sample of countries into low, middle, and high income groups using the dummy codes 1, 2, and 3. Next I create three different data frames that include only one income group each. Here, called sub1 for low income countries, sub2 for middle income countries, and sub3 for high income countries. Then we request a Pearson correlation measuring the association between urban rate and Internet use rate as well as its associated P value for each of our new data frames. We use the Pearson R function from the scipy.stats library and include our variables urban rate and Internet user rates. When we examine the correlation coefficients between urban rate and Internet user rate for each of the income groups, we find the following. For the low income group the correlation between urban rate and Internet use rate is 0.11 and the P value is not significant. For the middle income countries, the association between Internet use rate and urban rate is 0.32. With a significant P value of 0.001 and finally among high income countries the correlation coefficient is 0.089, with a large P value. Suggesting that the association between urban rate and Internet use rate is not significant for high income countries. When we map these findings onto the associated scatter plots for each income group, we are better able to visualize the significant and non-significant relationships. Estimating a line of best fit within each scatter plot shows the positive association between urban rate and Internet use rate among the middle income countries. And almost no relationship between these variables in both the low income and high income countries. >> Asking questions about statistical interactions can an interesting way to explore your data and your associations of interest. This is not difficult to do using the skills you've acquired this far. There are more advanced topics that we can cover here such as multivariate techniques that can be very powerful. But even without these techniques, we can still use bivariate inferential tools of ANOVA, Chi Square in correlation to describe our sample, make inferences about the larger population. And really begin to understand understand what relationships these associations hold, under what conditions, or at what levels of our third variable. Now that we've found associations, can we assume that association implies causation? We'll answer that question soon.