[MUSIC] So in this video I want to provide a primer about how to interpret regressions. We're going to use regression analysis throughout this course. We're mainly going to be interpreting regressions conducted by others. So it's important just to understand the basics of what regression output is telling us. So I provide a quick review of basic regression analysis in this video. But of course, there's many courses on Coursera that talk about statistics and we even have a great statistics course offered by Fatina as part of the IMBA program where you can see Coursera videos that she has produced. So let's start out in a finance context by simply looking at a scatter plot from a CAPM regression. So on our y axis we have the excess return of stock A here. And on the x axis, we have the excess return of the market. So what do we observe here, before I even do any regression? We see there's a positive correlation. Generally, when the market is doing well, the stock is doing well. When the market's doing poorly, the stock is doing poorly. And this is just kind of data that I created here, for purposes of this illustration. So then we estimate a regression. We have this regression prediction line here, looking at this kind of regression prediction line, it has a positive slope. This positive slope for a one unit change in the market excess return. What's the predicted change in stock A's excess return? We call this slope beta. It's the beta from the CAPM. What's alpha? Alpha is how much is this stock under or over performing its benchmark. So in the market excess return is zero, the CAPM prediction is the stock excess return should be zero. But instead it's some positive number and this positive number is the alpha the stock. So what are key regression parameters, not only from a CAMP regression, but from a regression in general? So obviously there's the coefficients from the regression, the standard errors of those coefficients, and various measures that indicate the precision of a coefficient estimate, like the b value, t statistic, and statistical significance of the estimate. And then also the R-squared, what's the goodness of fit of the model. P-value, statistical significance level, and t-statistic are all commonly reported and they're all related. They're all giving you a sense of how precise is this coefficient estimate, okay. The p-value represents the probability of finding the observed coefficient estimate or something more extreme under the null hypothesis. And a typical null hypothesis to be a tested is, is that coefficient equal to zero? So if you estimate a coefficient and the coefficient estimate is very large in magnitude and is precisely measured, that p value is going to be very small and it's test of, does a coefficient equal zero? It'd indicate that given your sample results, given your aggression results, it's extremely unlikely to get a coefficient so large if the true coefficient is zero. How about statistical significance? So you maybe hear someone say, this is statistically significant at the 5% level. Well, that means that the p value of the estimate is less than 0.05. And common statistical significant levels are reported, are 10%, 5% and 1%. And they're commonly indicated with one star for the estimate being significant at the 10% level. Two stars, it's significant at the 5% level, and three stars, it's significant at the 1% level. So if we lived in a Simpsons world and we had four fingers, then we'd be probably reporting results, are they significant at 8%, 4% or 1% level. But we don't live in a Simpsons world, we have five fingers. That's probably why we have these natural kind of benchmarks of 10%, 5%, and 1% statistical significance. T-statistics, well the t-statistic is simply the ratio of a coefficient estimate to the standard error of a coefficient. So it can be positive or negative depending if the coefficient is positive or negative. And it use to test the null hypothesis of a coefficient of zero. So under normality a t-statistic with an absolute magnitude of two, either it's greater than plus two or less than minus two, corresponds roughly to a p-value of 0.05 or statistical significance at the 5% level. So under normality, the t-statistic absolute magnitude is roughly 2. Either plus 2 or minus 2, that corresponds roughly to a P value of 0.05, or statistical significance at the 5% level, okay? So, what are common cutoffs used in practice to evaluate whether an estimate is viewed as statistically significant or not? P-value of 0.05 or smaller. Statistical significance level at the 5%, we're not in the Simpson's world, so it's not 4%. And a t-statistic and absolute magnitude of 2 or more. Now assuming we're using statistical significance level of 5%, one can statistically reject the hypothesis that coefficient equals x if the estimate is at least two standard errors from x. And as I already mentioned, a common test is, null hypothesis of a coefficient being zero, in which case x would be zero. Another key regression parameter, the r-squared. This is a measure of the goodness of fit of the regression. Technically, it measures how much of the variability in the dependent variable is explained by the independent variables and the R-squared can be between zero and one. So, zero means that x variables have no explanatory power of the left hand side or y variable, the dependent variable. If the R squared is one, it means all the data points line up perfectly on the regression line. So let's actually go through an example of Coca Cola, okay. So we estimate using monthly data. I think that's 1020 months over the period 1927 to 2011. Let's look at the stock returns of Coca Cola and measure them in excess of the treasury bill rate, and relate them to the excess returns of the stock market over this kind of long sample period. And when we do this we're going to get various parameters from our regression. We're going to get beta, we're going to get alpha and then we'll get an R-squared measuring how much of the variability in Coca-Cola returns has been explained by the ups and downs of the market. So here's a scatter plot when kind of chart this data in Excel and then do a regression. So here's the regression line, we're relating the excess return of Coca-Cola to the excess return of the market. Here's a regression equation we get here. Let's zoom in a bit. So one of the things you can see is you can see the positive alpha right here because when the excess market return is zero, this line doesn't go through the origin, this line actually is crossing at a positive number, that number's actually 0.59% per month. That's a Coca-Cola alpha. You can see looking at this scatter plot, may be better when the scatter plot is like this, zoom out, there seems to be a positive relation. Coca-Cola's doing better when the market is doing better. So you see this positive, positive line here. So let's actually then go and look at the numbers. Here's the Excel output. Here's our regression results. So the beta, the coefficient, and the excess market return is 0.57. We can see that it has a very small standard error. Alpha is the constant or intercept of the regression. That's measured here as this 0.59. And then the r squared, the fraction the variability of Coca Cola explained by variability in the market, that's 0.26. So let's look at these one by one. The beta 0.57, it's very precisely measured. We can see it's clearly statistically different from zero, but the standard error is so small we can also say it's statistically different from one. So, Coca-Cola is a defensive stock. When the market goes up, Coca-Cola on average doesn't go up as much, but when the market goes down, Coca-Cola doesn't go down as much. So defensive stock provides a little hedge, kind of, if you will, against market conditions. Next, the alpha. So the alpha is 0.59% per month. So Coca-Cola has beat its benchmark established by the Capital Asset Pricing Model by a wide margin. Okay, then R square. The R Square of 0.26, that means only a small fraction of the variability or movement in Coca Cola are actually explained by the market. 26% of the variability of Coca Cola is explained by the market. 74% is actually then explained by non-market or firm specific facts. So alpha, beta, and R-Squared, these are key regression parameters. And you don't just have to take my word for it, go to Morningstar, a key financial website. It reports the alpha, the beta, and R-squared from a CAPM regression for mutual funds. So it's actually very useful to help motivate why we should understand that ourselves. So now, we know how to interpret these statistics, let's go to Morningstar and look at the data for a couple popular mutual funds. So, here are two Fidelity Funds. The first one is a Fidelity Contrafund, we'll look at the second is Fidelity Magellan Fund. So, first let's do the Fidelity Contrafund. We're looking over data over the past 15 years here. Morningstar presents a CAPM regression over the past 15 years of data. We're looking Fidelity Contrafund, there's its ticker. What's our market? The market is taken as the S&P 500. So RM- RF, the RM is the S&P 500, the alpha from that regression on an annual basis is 3.5%. This has been a terrific mutual fund historically. This regression was done the period ending in 2016. So if it's 15 years, I guess that'd be 2002 through 2016. Over this period, the Contrafund has beat its benchmark on an annual basis by 3.5% per year after fees, remarkable. Now let's look at the Fidelity Magellan Fund. Okay, again the 15 year horizon. Again, the CAPM with the market, it being the S&P 500. All this is spelled out very nicely by Morningstar. Now, after we had this primary, know how to interpret this result, its alpha is negative. It's under performed its benchmark on an annual basis by about 2% points, okay. Not very good, that's a 2% on average or the 15 year horizon. Besides the alpha, we also see here the betas for the two mutual funds. The beta for the Contrafund is actually a little less than the beta for the market as a whole. While for the Magellan Fund, it's a little more than one, 1.13. So the Magellan Fund we can conclude is, on average, investing in a little riskier stocks than the Contrafund, by it's beta being 1.1 versus 0.8 for the Contrafund. And then also the R-squared is reported for both the Magellan fund and the Contrafund. So you can see the higher R-squared by the Magellan fund 92% versus 82%. For the Contrafund, that basically means that market movements are explaining more of the variability of the Contrafund. Later on we'll use our square as a measure of how much is your active fund actually a closet index funds. So what I love about this page here is it shows the alpha, the beta, the R-squared is not just something that some eggheads care about, it's on the Morningstar webpage as well. [SOUND]