In this session, we're going to introduce a certain kind of external measures called pairwise measures. What are pairwise measures? That means we take any two points. We see whether they agree between their cluster label and a partition label. So there are four possible cases for any pairs. We have true positive, false negative, false positive and true negative. Then we look at the cases. For example, you get any two points x sub i and x sub j. If they belong to the same partition T and they also in the same cluster C, in that case, this is a true positive. For example, we look at this case. The definition is the true positive is a number of such cases, okay? For example, for any two points x sub i and x sub j, if they have the same true partition label and they also have the same cluster label, that means they belong to the same cluster, then that's the case of true positive. For example, we just look at this case. For these two blue points, they belong to the same ground truth T2 and also they belong to the same cluster C2. That's true positive. Okay. Then what is false negative? False negative means they have the same ground-truth partition label, but on the other hand they are not in the same cluster. For example, you just look at these two brown points. They have the same ground-truth T1, but they belong to different, clusters. So that's the false negative case. Well, what is false positive? False positive means they actually have different partition label, but they are in the same cluster. For example, just look at this blue one and this brown one. They actually do not have the same partition label but they are in the same cluster, okay. What is true negative? True negative pairs actually means that they do not have the same partition label but they are also not in the same cluster. For example, if you look at this black one and this blue one. These two points, they do not have the same partition label, but they are also not in the same cluster. So that's a good case. Then we see how we can calculate the four measures. First, given n points in the data sets, the possible pairs you need to examine, actually the n chooses two, so that's the reason you have this formula. Then for the true positive case is you get all the, the partitions and the clusters, if they agree to each other, that's the case you have n i j, you choose any, these case you choose any two, you get that many cases. Then what is false negative? False negative means you get that many partition cases, but they do not belong to the true positive, okay. What is false positive? That means you have so many clustering cases, but they do not belong to the true positive. Then what is true negative cases? That means for all the cases, they do not belong to any one of the above three cases, then they are true negative. Okay, so that computation we can just simply use those formulas. With the introduction of true positive, false negative, these four measures, then we can calculate other measures like the Jaccard coefficient and Rand statistic. Then, we still take this figure as our illustrative example. Then for Jaccard coefficient, remember we define the Jaccard coefficient before. And this one is the definition for the pairwise measures. And this Jaccard coefficient have the similar kind of flavor. You probably can see. You have true positives divide by all the cases except that you ignore the true negative case, okay. That means the fraction of the true positive points, but after ignoring the true negative cases. Therefore, this computation, positive and negative are different to the, our asymmetric measure. However, for perfect clustering Jaccard coefficient should be one because, you know, all the cases you actually cover. You don't have those false ca, cases. For a Rand statistic is you take all the true cases, true positive plus true negative divided by all the possible pairs. Okay? That means you don't cover anything like a negative. Okay? That's why, if it's perfect clustering, Rand statistic should be 1. It is symmetrical because you take a positive and a negative case just, equally. Then there is another interesting measure called Fowlkes-Mallow measure. This measure is a geometric mean of precision and recall. Remember we just studied F measure, which is a harmonic mean of precision and recall. For geometric mean Fowlkes-Mallow measure, essentially is precision times recall, you get their square root. And once we introduce all these measures, if we wanted to calculate for example any table like this one this green table, we can use this measure to calculate all the numbers. We will leave this calculation as an exercise, instead of spending time lecturing here. [MUSIC]