Scatter plots, or scatter diagrams, are used to investigate the possible relationship between two variables that both relate to the same event. A scatter diagram gives you a visual representation of relationships that can be confirmed through correlation regression analysis. Scatter diagrams are useful for exploring root cause. Although they do not specifically indicate cause and effect, used with other evidence cause and effect might be implied. Scatter diagrams are usually created using statistical software, such as Many Tab, Excel, or some other spreadsheet application, but they're easily plotted by hand. The independent variable, the one you think might be the cause, is the y axis or vertical axis. The dependent variable, the one you're trying to fix, is the x, or horizontal axis. For every event or occurrence, two pieces of data are reported. One for the independent variable and a related one for the dependent variable. Data is recorded in x, y pairs. Each point on the plot represents the pairs of values for x and y. In this simple example, we will use just three pairs of data. We have created a table or a spreadsheet with columns labeled x and y. Each pair of data is plotted on a chart. Remember that the y value should line up with the appropriate point on the y axis and the x value should line up with the corresponding point on the x axis. For example, our first pair is x = 4 and y = 2. We select a point directly above the x axis value of 4 and directly to the right of the y axis value of 2. We then plot a point at this spot. This process is repeated for every pair of x, y data. Our example shows just 3 pair of data but a useful scatter plot will probably contain at least 30 pairs. If the points on a scatter plot cluster in a band running from the lower left to the upper right, there is a positive correlation. That is, when x increases, y increases. This is the condition in the chart in the top of the slide. If the points cluster in a band from the upper left to the lower right, there is a negative correlation. That is, when x increases, y decreases. If the points tend to be close to the line, this indicates a strong relationship. That is, as x increases, y increases proportionally. The more the points cluster closely around an imaginary line of best fit, the stronger the relationship that exists between the two variables. If the points do not seem to establish any particular direction, there may be no relationship at all. These are some real world examples. In this scatter diagram, there appears to be a positive relationship between SAT scores and college grade point averages. It's positive but it's not very strong. This scatter diagram shows the relationship between hours worked per week and grade point average for college students. It is probably not surprising that as the number of hours increases, the GPA goes down but there's a surprising upturn at the end. Students working over 40 hours a week seem to do better than those working 20 to 40 hours a week. We've so far talked about linear relationships. That is, they seem to follow a straight line but that's not always the case. Sometimes relationships follow a curved line. Like histograms, scatter diagrams provide a useful snapshot. It's important to remember that this is only a snapshot. More conclusive analysis can be done with correlation and regression. Correlation tells you if there is a relationship between two variables. We can tell how strong the relationship is but we don't know if x causes y, if y causes changes in x, or if something else causes changes in both of them. If we want to know more, we need to do more analysis. Correlation and regression analysis is a topic for future courses but for now, it's enough to know that correlation tells you the direction and the strength of a relationship. A correlation will be some value between -1 and +1. A correlation of +1 indicates a perfect positive relationship. That is, all of the points will be right on the line. A correlation of -1 indicates a perfect negative relationship and a correlation of zero means no significant relationship between x and y. Regression analysis will tell you the nature of the relationship between an independent variable and a dependent variable and regression allows you to create and test a predictive equation.