>> To explore this question, we create a third variable that is categorical.
For this new variable, the income-per-person variable,
which is quantitative, will be categorized as high income countries,
middle income countries, and low income countries.
The adjustments we made to our program are very similar to the adjustments we made to
our Nova syntax, and to our chi square syntax when testing moderation.
We'll start our program calling in our libraries and loading the gap minder data.
Next, we set our three variables of interest to numeric, and
set blank data on our third variable to NAN.
Then I create a new data frame I am calling data_clean that drops all missing,
that is NAN values for each of the variables in the data set.
Now, I create my income group variable which splits the sample of countries into
low, middle, and high income groups using the dummy codes 1, 2, and 3.
Next I create three different data frames that include only one income group each.
Here, called sub1 for low income countries, sub2 for
middle income countries, and sub3 for high income countries.
Then we request a Pearson correlation measuring the association between
urban rate and Internet use rate as well as its associated P value for
each of our new data frames.
We use the Pearson R function from the scipy.stats library and
include our variables urban rate and Internet user rates.
When we examine the correlation coefficients between urban rate and
Internet user rate for each of the income groups, we find the following.
For the low income group the correlation between urban rate and
Internet use rate is 0.11 and the P value is not significant.
For the middle income countries, the association between Internet use rate and
urban rate is 0.32.
With a significant P value of 0.001 and finally among high income
countries the correlation coefficient is 0.089, with a large P value.
Suggesting that the association between urban rate and
Internet use rate is not significant for high income countries.
When we map these findings onto the associated scatter plots for
each income group, we are better able to visualize the significant and
non-significant relationships.
Estimating a line of best fit within each scatter plot
shows the positive association between urban rate and
Internet use rate among the middle income countries.
And almost no relationship between these variables in both the low income and
high income countries.
>> Asking questions about statistical interactions can an interesting way to
explore your data and your associations of interest.
This is not difficult to do using the skills you've acquired this far.