Welcome back to Practical Time Series Analysis. This is the last of our gentle review videos, where we're going over some of the concepts you would have studied in your introductory statistics classes. This one particular deals with correlation. Correlation is a really critical topic for us. As we study time series, very often we use perhaps the most important graphical tool that we have, the ACF or auto correlation function. In order to meaningfully interact with that, we have to have a good understanding of what correlation is all about. There are many ways to measure linear association, or rather the association between two variables. Linear association is very common and perhaps the most common of all is Pearson's product moment correlation coefficient. That's what we talk about in this video. Specifically we'll review how to plot data In such way as to make a quick visual interpretation about whether we think there's a linear association between the underlying variables. We'll look at the formula for co-variance and for correlation and we'll try to understand where the definition comes from. And I'll try to convince you that this is really the definition you would come up with if you took some time to think about it. It's nice to have an example to guide our thinking. We'll look at the trees example, which should be available to you just by opening R. If you do the help command on trees, You'll see that we're looking at a relationship between girth, height, and volume for these black cherry trees. I think of this as volume speaking to the commercial utility of a particular tree. How much lumber are we going to get out of the tree? That's an interesting thing and when you're out in the middle of the woods, it's hard to predict except we could measure some variables like girth and height. Those are things that you could get through with a tape measure or perhaps a Biltmore stick, as I remember from my Earth Science class in middle school. Those are very, very, easy to obtain and the question is, can we use them to make predictions about volume? The pairs plot that I've got here off of this data set with a couple plotting commands, in particular you can see that I'm using red dots, tells the story. Girth is very strongly associated with volume. Girth is a really great predictor of volume in these trees. The height of a tree is also a decent predictor. Not surprisingly, as the height increases so does the volume that you're going to obtain. But really girth is the strong predictor. Let's calculate the covariance. This might be a little surprising. The covariance between girth and volume is just a little under 50. The covariance between height and volume is actually more. That's not consistent with the pictures that we just saw unless you start thinking about the units that are involved. When we take a correlation we try to look at the relationship without worry about the units. As we switch from yards to feet to miles to kilometers, we're going to change the covariance, but the correlation should remain the same. And reassuringly here, we see that the correlation between girth and volume is really quite high. In these pictures, we try to understand where a formula measuring strength of linear association might come from. You'd probably agree that on the left we have a set of data points which fall very close to a straight line. Not so on the right. I've created a sort of local set of axes here based upon the averages. So if you take the average y value, you can put a horizontal line. The average x value, you can put a vertical line. And you see that the data is falling really predominately in the first and third quadrant. So think about what a deviation might look like. This is an ordered pair, there's deviation from the x, there's a deviation in terms of the y. This data point is above average in x and it's above average in y as well. So we'll look at the deviations, we'll have positive quantities for the x deviation and the y deviation. If we're going to multiply those together, positive times the positive is positive, we would still get something positive. We'll do that for every data point. Down here, where we also have quite a few data points, the x values are below average. The y values are below average. Your deviations in x and in y are both negative. Negative times a negative is a positive. So if we take some sort of cumulative measure by say adding up all of the products of the deviations, we're going to get something that's contributing coherently. Look at the second quadrant and the fourth quadrant. Here x values are below average, y values are above average, negative times a positive is a negative. Look in the fourth quadrant, the x values are above average, the y value is below. And again, we would have when we take the product of the deviations, we'll have something that gives us a negative value. The positives clearly out number the negatives and we would get strong contributions towards covariance. In the figure on the right, each of the quadrants seems pretty well equally stocked with the data points. Positive contributions in quadrants one and four, negative contributions in quadrants two, I'm sorry with one and three will get positive contributions. In two and four we'll get negative contributions and they kind of cancel out. We would expect a strong covariance in the first picture and a weak covariance in the second. The formulas that we typically use reflect the conversations we just had. When you look at data, your covariance will be a sum of deviations in x and y and we can even take an averaged quantity. Instead of dividing by 1 over n, the number of data points, we'll come up with an unbiased estimator by dividing by 1 / (n- 1). The corresponding formula, a little bit more theoretically or when considering random variables, is to look at the covariance as an average, and expected value of the centered random variables. The correlation moves the same sort of way. For the correlation for random variables, we look at an expected value and averaged quantity. And here we do some centering and some scaling to get rid of those units. You'll recall on your elementary stats course, that if you have a data point minus a mean over a standard deviation, we're talking about standard units. Very often, people use the letter z to represent that. For data sets, no surprise here. We'll estimate the standard deviation and we'll do our centering and scaling. There are more compact formulas that we can come up with if we introduce sum of squares notation. We've seen this before. There's a definitional formula, (x- x bar)(x- x bar). But there's also a corresponding computational formula that you get just by doing a little bit of algebra. This allows us to write our covariance and our correlation much more compactly. What we're doing here is substituting in the formula for the standard deviation in terms of the sums of squares. And then just noting that we can see a sums of squares up in the numerator here. There's sums of squares in the bottoms. We cancel all of the n – 1 terms to get rid of some clutter. And at the end of the day, the correlation can be expressed rather simply in terms of sums of squares. In this video, we took some time to recall pairwise plotting. That gave us a nice visual way of trying to assess strength of linear association. We looked at the motivation behind the calculations for covariance and correlation. And I tried to convince you that the formulas really do make sense.