Social scientists have shown that a leader's physical height is related to his or her success. Suppose you want to test if you can replicate this result. To do that, you look at the heights and average approval ratings of the four most recent presidents of the United States. You employ this data matrix, and your goal is to answer four related questions. One, is there a linear relationship between the two variables? Two, what is the size of Pearson's r correlation coefficient? Three, what do the regression equation and the regression line look like? And four, what is the size of r-squared? Let's start with the first question. Is there a linear relationship between the two variables? To answer that question, we make a scatterplot. To make a scatterplot, you must first decide what's the dependent variable and what's the independent variable. In this case, it's more likely that the leader's physical height influences his or her approval ratings than that approval ratings affect the leader's height. After all, it would be silly to expect the leader to become taller once his or her approval ratings get better. So, the independent variable, height, goes on the x axis, and the dependent variable, approval rating, on the y axis. Based on the minimum and maximum values of our variables, we scale our axis. Our independent variable, height, ranges from 182 centimeters to 188 centimeters. We, therefore, use a scale from 180 to 190 centimeters. Our dependent variable ranges from 47 through 60.9, we, therefore, scale this axis from 45 through 65. Next, we decide based on our data matrix where we should position the four presidents. Obama is 185cm tall and has an approval rating of 47, so he should be positioned here. Bush Jr. has a physical height of 182 centimeters and an average approval rating of 49.9, so we position him here. Clinton and Bush Sr. are located here. Now we can answer the first question, yes, there seems to be linear relationship between a leader's height and his approval rating. The line describing this relationship goes up, which means the correlation between the two variables is positive. The second question is what the value of Pearson's r is. To compute Pearson's r, we need this formula. To start with, we need to compute all the z-scores of both our independent and our dependent variable. To do that, we need the means and standard deviations of these variables. I assume that you know how to compute them, so I will just give them to you. The mean of the independent variable, height, is 185.75 centimeters, and the standard deviation is 2.87 centimeters. The mean approval rating, the dependent variable in the study, is 53.23, and the standard deviation is 6.12. First, we compute the z-scores for our independent variable by subtracting the mean from every original score and then dividing the outcome by the standard deviation. We do that here. 185 minus 185.75 divided by 2.87, that makes -0.26132. We also do that for the other scores, here are the results. We then repeat that for the dependent variable. 47minus 53.23 divided by 6.11 makes -1.01964. And we do that for the other cases, too. The next step is to multiply the z-scores of every case with each other. For the first case, this results in -0.26132 multiplied with -1.01964, that makes 0.266456, and so on. We have now finished this part of the formula. Next, we have to add up all these values, that makes 2.202649. Finally, we have to divide by (n- 1). The n is 4, so n minus 1 equals 4, minus 1, is 3. The result, rounded up, is 0.73, that's our Pearson's r. It indicates that there's a rather strong and positive linear correlation between a leader's body height and his average approval rating. The next step is to find the regression equation. The computer finds the regression line by looking for the line that minimizes the sum of the squared residuals. You do not have to do this yourself. Luckily, the complicated procedure boils down to two rather simple formulas. One formula to compute the regression coefficient, that's this one, and one formula to compute the intercept, that's this one, and together these formulas give you your regression line. We already have all our necessary ingredients, so now we can use the formulas. The regression slope is 0.73 multiplied with 6.12 divided by 6.87, that makes 1.56. The intercept is 53.23 minus 1.56 multiplied with 185.75, that makes -237.11. The regression equation is y hat minus 237.11 plus 1.56 times x. The intercept indicates the predicted y value is -237.11 when x is zero. This number has no substantive meaning because a physical height of 0 meter is impossible. The intercept only serves mathematical purposes. It makes it possible to draw the line. With the regression equation found, we can predict the value of our dependent variable when our independent variable equals 182 centimeters, the minimum value in the sample. That's -237.11 plus 1.56 times 182, that makes 46.81, that's here. We can also do that for our maximum value, that's -237.11 plus 1.56 multiplied with 188, that's 56.17, and that's here. We can now draw the regression line. This line is the straight line that best represents the linear relationship between x and y. It is the line for which the sum of the squared residuals is the smallest. We can, of course, predict y-values for every possible x-value. All of these predicted y-values, or y-hats, are located on the regression line. The fourth question we want to answer is, what the value of r-squared is. That's easy, it's Pearson's r squared, so 0.73 multiplied with 0.73 equals 0.53, but how should we interpret this number? Well, we can say that the prediction error is 53% smaller when we use the regression line than when we employ the mean of the dependent variable. We an also say that 53% of the variation, or the variance in the dependent variable, is explained by our independent variable. So, what have we done in this video? First, we determined the straight line that describes the relationship between our two variables best. Second, we have predicted values of our dependent variable based on the line and the corresponding regression equation. And third, by means of Pearson's r and r-squared, we have investigated how well the line fits our data, but what have we learned substantively? Well, that tall leaders are more successful than short leaders. However, this conclusion is based on a sample of only four American presidents, who don't differ much from each other when it comes to their physical height. It is up to you to decide if this warrants far reaching inferences about relationship between height and approval ratings.