So, this is the result of hierarchical clustering done using Excel Stat. Excel Stat uses agglomerative clustering, and you can see the y axis in the dendrogram that's on the left. The higher up the connection, the more different the groups being connected are. And so, this branch connects two very different groups. This branch connecting two groups, that are still fairly different, but not as different as that initial branch. And as we get further toward the bottom, the smaller the difference between the individuals. Who are being connected. We can see where the software has indicated for us, the dashed line and as well we can see on the other graph, just desumed in the view of this. On that dashed line, if we think about how many clusters that represents, you can see the labeling here, being that the dashline here in blue is one large cluster, and then looks like we've got two smaller clusters on the other side. One that's represented in red, one that's represented in green, right. So, here's a visual way of saying based on the variables that we're using. So, what we used were the factor scores from our previous exercise. Based on those nine factor scores. What we've determined is that it appears that there are three different segments, based on those factors that were put into the analysis. So that's nice, it tells us that we might want to look to have three clusters in our data. And what you can do, is you can go through and see which individuals each of these are. And Assign the individuals to clusters based on this approach. A slightly easier way of doing it is, let's take the results from the hierarchical clustering, we say there are three clusters. Let's use a non-hierarchical approach, it's referred to as k-means clustering, and that's going to allow me, to specify the number of clusters included in my analysis. And so, if we take again going back to our dendrogram, if we take a look we've got our three clusters. That's what we're going to tell the k means analysis, that we want to use as our input. So what K-means clustering does, this is going to be an iterative method. I'm going to tell the algorithm, how many clusters I want. And I'm going to begin with a random assignment, of individuals to the different clusters. And iteration by iteration we're going to kind of reshuffle that mix, and what we're seeking to do again is individual, or respondent that are put within the same cluster. We want them to have relatively small difference from each other, and we want the cluster centers to be more different from each other. So within clusters differences are small, between cluster differences are large,right? So, within Excel stat under the analyze data, the choice of using K-Means Clustering, you specify where the variables are that you want to use as the basis for segmentation. So, this basis for segmentation are the factors scores, for the nine different factors we got out of that automotive survey. We also specify how many segments or how many classes we want. And so, I've told the computer here are my the nine items, the nine facts the nine themes that we identified, where each responded how to squad each of those teams. And I want you to allocate customers, allocate my respondents into three different classes. And so one of the outputs that we get from K Means clustering is this table that tells us, what is the center of the cluster. Think of this, almost like as the average score within the cluster. And so, what I've done in this table, is that I've put in bold kind of the high and low values, for the different score. So, if we look at cluster one, which has 122 responded incident, so you can notice that our clusters are relatively evenly sized. What does cluster one look like? They score higher on the financial freedom dimension. This is called lower on the optimism dimension, higher on that societal indifference dimension. They're the highest, scoring on the family dimension. And they scored low on that environmental indifference. And so, those were the nine factors, that were put in as the basis for segmentation. Purchase intent I've added here after the fact, that's not one of our basis for segmentation. So, what we've done is we've built profiles using these nine scores, of the three segments. So, segment one seems to have financial freedom, but they're not optimistic and they're not image oriented. Segment two is very optimistic, very patriotic, image oriented and adventurous, not focused on the family. And then segment three, seems to be the least patriotic, also the least adventures. Well, these segments are relatively equal sizes. We can look at what's the average response, in terms of purchase intention, for each of these three segments. And that's where it gets interesting. Now, what we see is find a way, set class two, or segment two has the highest purchase intent. These are the people, who are the most interested in this particular product. Well, the people who are most interested are optimistic. They're patriotic, they're image conscious and they're adventurous. So, when we're going about building our marketing campaign, coming up with the ad creative, these are the individuals that we want to appeal to. The next most interested in the product, is segment one and the least interested in the product is segment three. So, based on the survey, we've conducted the factor analysis to get a better understanding of those underlying themes. We've used those underlying themes to form three different market segments. And we've identified which segment is the most interested in the product, and how we want to communicate with them. So, that's what segmentation allows us to do. So just to review, again, we collect our information through the survey. We identify those latent drivers, that psychographic profile using factor analysis. Now, we've used cluster analysis to build our segments. We've identified which segment is the most interested in the product. That's going to allow us, to build communications material. To target those individuals. Now, the challenge is, how do we reach that segment? What media are they using? Who are these people? And that's where another technique comes into play. It's referred to as discriminant analysis. And what we essentially, this is the Muller image of cluster analysis. Cluster analysis said, I have information about customers, but I need to organize them in such a way that I have similar segments. What discriminant analysis does is say, you tell me the segments that your customers belong to, and who your customers currently are. And I'm going to tell you, what are the most important criteria, in reaching those customers? Which are the factors, that you can look at to say, someone who scores high on this dimension, that's what puts them in a particular cluster, all right. So, we can just summarize the general idea of discriminant analysis is. I have a set of individuals for whom I know, which segment they belong to and I've additional information about those individuals. I want to figure out what is the most informative information I have, that tells me when a new person walks in. And I have those demographics or psychographics available to me, which ones are the most diagnostic of assigning them, to a particular segment. As far as the algorithm itself that discriminant analysis uses, what it's going to do is, it's going to come up with a score for assigning an individual to, each of the segments. And you'll see that, this discriminant function looks similar to linear regression. That's they shared the idea, of using a kind of linear combination of your predictors as part of the algorithm. The places where they're different from each other, one is in terms of the objective function. We're not trying to minimize our sum of squared errors, like we do in linear regression. What discriminant analysis is trying to do is maximize the hit rate. That is, I want to assign customers to the right segment. If I get it right, that's a success. If I get it wrong, that's a failure, and so the hit rate is the average-- think of it as your success rate, how frequently am I accurately assigning people to segments. So, we want to maximize that. The other place where discriminant analysis, and regression differ from each other, is just the nature of the outcome variable. The outcome here, that we are using with discriminant analysis is group assignments. Think of that as a categorical outcome. Where as, with linear regression, we are dealing with continuous measures. Just to show you the screenshot within XL Stat, if you are using that particular tool. Your outcome variable. It is going to be qualitative. That is the group membership number. If you have three segments, you are belong segment one, segment two, or segment three. The axis, is the explanatory variable. That could be quantitative measures, could be qualitative measures. But those are the predictors, that we're going to use to try to assign individuals to different segments. Now, the important thing that we want to look at, with discriminate analysis is, just how good a job are we doing, at I'm predicting where someone belongs. So, this is part of the output from discriminant analysis. I'll walk through how this output is structured. And this is output that was produced using SPSS, another software package that you might have access to. So within SPSS and within most packages, you can look at your original or your calibration data. And we have the predicted group membership versus, this is the actual group membership, versus the predicted group membership. And if we look at respondents who were actually in segment one, and we predicted them to go to segment one is 75.8%. Respondents who were in segment two, when we predicted to go to segment two, is about 90%. Respondents who were in segment three, we predicted them going to segment three Is about 83%. And so we can think of the overall hit rate it's giving us in the foot no on bottom, 83.3 % of the correspondents were classified correctly. So, that diagonal tell us the accurate classification. The off diagonals triangles where we screwed up, all right, so that's for the calibration data. One of the nice things that some software packages do for you is cross validation. Let me omit some of my data, and let me see how accurately I'm able to predict the membership. For those particular observations, even though they're not used for calibrating the model. And, you see, we do Not a bad job here in the cross validation case 81.8% were classified correct. So, we don't drop too much. Now, one of the things that I've done in this exercise, was we had factors that will produce ultimately giving us, nine underlying behavioral themes. And so, what I had done was to say okay well those nine factors are the result of a survey. And it was a pretty lengthy survey around 30 or so items on that survey. Suppose that I don't get a chance to give that survey to someone every time they come to a car dealership. But I'd like to know which segment they're in. Could I get away with, instead of asking them 30 questions, what if I was only able to ask them one question that corresponds to each of the factors? How accurately would I be able to classify people? So, I've gone from 30 questions, down to just nine questions, and we see if we can get the answers to just those nine questions. We do a pretty good job, of classifying people into these different segments. And so, that might be one approach is to take your sales associates, and maybe they can get some information from individuals. Consumers, they may not be able to get all the information from survey, but they can get some of that information. Now, another way we can think about discriminant analysis being used, is suppose I have very detailed surveys. That I'm doing 100 of questions, and I've run it through the cluster analysis. Well, I'd like to be able to identify those people, as they come into my store, or as they come into my dealership. But in order to do that, I have to rely only on demographic information. We can run discriminant analysis using just demographics. And so, even if our survey was all based on psychographics attitudinal responses, we can still say well let's see how good a job demographics do at capturing the differences that exist across these different segments. All right so, as far as takeaways from the session. Customer segmentation is a fundamental task within marketing. We're doing segmentation, we're doing targeting, we're doing positioning. As far as forming those segments, we get a lot of that data that we need from surveys. You can also form market segments if you're doing marketing analytics. Such that you're getting customer level coefficients, where you're doing conjoint analysis and you're getting customer preferences for different product attributes. We can actually engage in forming market segments based on those coefficients. But, conducting factor analysis and then forming market segments based on The factor scores, very common way of approaching dealing with survey data. So, we've talked about in this session, how do we move from having those factor scores to assigning individuals, to the different segments we've built profiles for those different segments. We can describe them, we can say which of these segments Is the most likely to be interested in the product. All right, but what are the next steps? We've said that there are different market segments, and some segments are more interested in the product than others. We still have to know, how big are those segments relative to each other's? Which one can we more easily reach, what's the appropriate way of reaching those segments? Which segments might we face more competition when we go after? So, this is by no means the end of the road, this is a best way in the middle we understand our consumers better than we did, without doing the segmentation we identified the more homogeneous group of consumers within each of the segment. Yeah, the next task is, let's take these results and figure out the marketing mix that's appropriate, figure out the media mix that's going to be appropriate for those segments, that we ultimately decide are worth going after.