To demonstrate how to request a correlation coefficient in Python, let's go back to the scatter plots we created for some of the gap minder variables. We use these scatter plots when visualizing the association between two quantitative variables. The first scatter plot shows the rate of internet users by the rate of the country's population living and urban settings. The second shows the rate of internet users by income per person. From looking at the scatter plots, we can guess the associations are positive, that is, a higher internet use rate is associated with both, higher urban rates and greater income. Now let's find the correlation coefficients. To do this in Python, we add the following syntax to our Gapminder program. First, I create a new data frame I am calling data clean, or data_clean, that drops all missing, that is, N/A values for each of the variables from the Gapminder data set. We do this because a correlation coefficient can not be calculated in the presence of N/A's. Next, I request a Pearson correlation, measuring the association between urban rate and internet use rate, and then between income per person and Internet use rate. I used the pearsonr function from the SciPy Stats Library. And include each variable pair in a separate command. Python will then generate both the correlation coefficient and the associated p-value. For the association between urbanrate and internetuserate, the correlation coefficient is approximately 0.6, with a very small p-value. This tells us that the relationship is statistically significant. For the association between incomeperperson and internetuserate, the correlation coefficient is approximately 0.75. And also has a significant p-value. Now we can actually interpret the scatter plots and the correlation coefficients. The association between internet use rate and income is fairly strong, and it's also positive as the scatter plot has already shown us. The association between internet use rate and urban rate is also positive, but slightly more modest at 0.61. Both are statistically significant. That is, for both associations, it is highly unlikely that a relationship of this magnitude would be due to chance alone. >> Here's some good news. Post hoc tests are not necessary when conducting Pearson correlation. Post hoc tests are needed only when your research question includes a categorical explanatory variable with more than two levels. Because our explanatory variable and the context of correlation coefficient is quantitative, there's never a need to perform a post hoc test. Another interesting and useful aspect of the correlation coefficient is, if we square the correlation coefficient, that is, we multiply it by itself, we get a value that also helps our understanding of the association between the two quantitative variables. Small r squared is the fraction of the variability of one variable that can be predicted by the other. For example, when looking at the relationship between urban rate and internet use rate, if we square our correlation coefficient of 0.61, we get 0.37. This could be interpreted the following way. If we know the urban rate, we can predict 37% of the variability we will see in the rate of internet use. Of course, that also means that 63% of the variability is unaccounted for. If we square the correlation coefficient for income per person and internet use rate, we get a value of 0.56. This suggests if we know income per person, we can predict 56% of the variability we'll see in the rate of internet use. This is a little bit more impressive because we can predict over half the variability. Again, correlation coefficients are commonly denoted with a lowercase r, and they're squared to determine the amount of variability that can be predicted. [MUSIC]. You might be wondering how much variability in internet use rates can be predicted if we consider both urban rate and income per person. A multivariate inferential tool called multiple regression can be used to answer this question, and we'll discuss that in the future. [MUSIC]