Excel and Sheets include a large collection of statistical functions. Two of the most useful in developing models to help predict future events are correlation and regression. Correlation helps us make sense out of the data we collect in our business. For example, does the length of service of our employees in a field sales department correlate to success in sales? A correlation exists when two sets of data change together. Regression is a statistical technique for using one data set to estimate a second correlated data set. For example, in the market for diamonds, the market price is correlated with it's weight. Using regression analysis we can predict the probable price of a new diamond using it's weight. In this lecture, we will use Excel's statistical functions to measure correlations between variables in our business models. We'll review what the correlation coefficient tells us about the strength of the relationship between two variables. We'll explore the diamond market example again to practice the use of regression for predicting a particular diamond's market price, and we'll use multiple regression to improve our forecast. Now let's turn to the spreadsheet. One of the most common of the classic business models is the regression analysis model. To demonstrate regression, let's use a dataset that describes the sales of diamonds in Singapore. The weight of each diamond in a sale is located in column A. The corresponding price is in column B. If we plot the two datasets using the scatter plot as I have done here, the data appear to be almost a straight line, but not perfectly. I can use the Excel correlation function to see how closely changes in weight correlate to changes in price. That is how much do the two numbers vary together? So to do that I'll use the Excel function correl. Correl asks for two arrays. In this case, those are A4 to A51, which is the weight, and secondly, B4 to B51 Which is the price. When I run that correlation, I get the result 0.989. Correlations run from minus 1 to plus 1. A 0 would indicate no correlation, that is the two numbers don't vary together at all. This number 0.98, is a very high correlation. Let me clear away chart 1. Regression analysis takes this one step further. It asks the question, to what extent can the change in one variable be predicted by the change in another? Now this presumes that we have some logic, or theory, suggesting that one number drives the change in another number. In this case, our theory is that weight is deriving price. To run a regression, choose data. Data analysis, And roll down to the regression option, clicking then okay. For the y variable, that is the price we're predicting, choose the range B4 to B51. For the x variable, choose the weight, A4 to A51. Click here to indicate that our ranges include headers. Click the Labels option to indicate that the ranges we have identified include labels. Then click OK. Once you click OK, a regression report will appear in another sheet. The summary output of the regression appears in this second tab. The key numbers for here are the three regressions that are listed at the top. This R represents the correlation we saw before. R squared is the covariance, the amount of change in y that is due to a change in x, according to the regression. The adjusted R squared makes allowances for the relatively small sample size, as you see here, only 47 observations. But this model suggests that 97 or 98% of the change in price can be attributed to a change in the weight. That's the classic regression analysis model used in many business scenarios. Those include scenarios covered by other courses in the business and financial modeling specialization. Let's summarize Module 3. In this module, we used some statistical functions to review historical sales data. Those include means and standard deviations, and we then applied what we learned from that analysis to a sales forecast. We also used functions for including uncertainty in our forecast model. We used the rand() and randbetween() functions for generating random numbers. We talked about the difference between uniform and normal distributions, and we used the data analysis tool pack to generate random numbers that were pulled from the normal distribution. We looked at forecast models that were structured according to discrete time, with columns or rows indicating the passage of time. We then compared that to model structures that allowed for the treatment of time as a continuous variable. We looked at linear and proportionate growth in metrics over time. We began to address the topic of non-linear functions for showing either growth or decline. We practiced the spreadsheet functions EXP, as well as the forecast and growth functions. We created a model that linked the probability of events in a series. And calculated the probability of different series of events using a probability tree. We also incorporated some revenues and expenses into those trees, in order to create a decision tree modeling the expected value of any given branch. Finally, we examined two related sets of data using spreadsheet functions for measuring the strength of correlation between the two sets. We also used tools for running a regression analysis using values from one data set to predict the value of another.