Yuan Shao, Cao Cao, and Liu Bei led the coalition to claim a new city. They planned to destroy a number of important camps in this city using catapult shots. A number of camps could be destroyed with a single catapult shot. If any two of the camps were no more than five miles away from each other. The coalition came up with an initial plan to divide important camps into three groups to be covered by three catapult shots. But later another plan was proposed. In the new plan the distance between the groups of camps was larger than in the initial plan. This was more desirable because the rescue teams would have a harder time running between the groups after they had been attacked. To ensure they had devised the best plan possible, Liu Bei pulled out the magical tablet. So, Liu Bei is pursuing the armies of Dong Zhuo. And he finds a camp of the army out on the fields in range of his catapults. Now his catapults fire large explosive charges which can do a substantial amount of damage. So, they will be able to damage a group of tents as long as all of them are within five of each other in distance. Now, in order to do the most damage because these explosive charges are very expensive, he wants to make sure that none of the charges overlap. So, each of the charges will hit different sets of tents. And in order to make it as most difficult as possible to recover, he wants to spread out the charges as far as possible, so make them as far apart as possible, to get the most effective damage on the camp of the army as possible. So, this gives us a clustering problem. So, we given a set of points, which are the tents of the army of Dong Zhuo and we have to divide them into cup clusters. And most k clusters, that's we've got k explosive charges to fire. Such that, what? No two points in the same cluster are more than a maximum separation away from each other. That's the fire distance that the explosive charge will handle and we're going to maximise the minimal distance between any two points in a different clusters as to get them far apart from each other. So, this is an example of a clustering problem and clustering problems are very common problems that we solve for many different purposes in the world. Now this an example of a pure partitioning problem. Again, so we've got a function from a DOM domain to a COD domain. It's again a partitioning problem. And why I call it pure partitioning problem is we want to partition the DOM domain which is the spots into the COD domain which is the clusters. But the COD domain is really meaningless. It's really just a set of names for the subsets of the domain that we're going to get. So, in the easy case here we have a bounded number of clusters. All right? We've only got a certain number of partitions that we can get. That's given by the maximum number of explosive classes. So, our COD domain is 1 to K there's a harder case of this problem, the clustering problem, where we don't know how many partitions we want to do, and we'll talk about that later on. So, a pure partitioning problem we just basically have a set domain, which we're trying to break into subsets. That's what we're doing. A pure partitioning problem is partitioning this set DOM into at most k parts. So, for each element of domain, we just need a variable to indicate which partition to sign to. So, this is a very simple representation mapping the DOM into this variable which takes the variable from 1 to k. But we have to be aware that there's multiple representations of this partitioning because the partitioning number is immaterial. So, we've got to remember that representations in the model, we want to have exactly one representation in the model to represent the same answer in the problem. We can see that this is not the case here. So, if we look at the different answers in the models, so in this first example, we have 1 and 3 mapped to partition 1 and 2. Mapped to partition 2, and 4 mapped to partition 3, but that's really the same partitioning as the second solution here where we have 2 mapped to partition 1, 4 mapped to partition 3 and 1 and 3 mapped to partition 3. So, what we want to make sure is only one of these possible solutions is allowed in a model, so that we don't end up with multiple different ways of representing really the same solution. because we don't care how the partitions are numbered. So, we need to add constraints to leave only one representative for each of the partitionings. Because what we have here is what's called a value symmetry. The values for 1 to K are symmetric to one another. We don't really care which way you number the names of the clusters, we don't care the names of the clusters. Whichever way they're numbered doesn't matter. So, all we care about is which values of the domain end up in which clusters together. And this feature arises in many discrete optimisation problems. And so, value symmetry is an important thing that we'll talk about at many stages in this course. So, how do we enforce that there's exactly one representation? So, the basic idea is this, we're going to order the sets of the partition by their least element. So, if we think about our partitioning example that we looked at this partitioning 2, 4, and 1, 3. Then there's only one way to order that if we order it by the least element. So, 1, 3 has the least element 1, so it goes first then 2 and then 4. And so the unique representation is this 1, 2, 1, 3. That's the only way of representing this partitioning where we order them in this way. So, they ordered by their least element first. So, how can we enforce this constraint where we can do so by saying the least element in partition i is less than the least element in partition i+1. And here's the constraint that we'll do this basically exactly states that. We look at all the elements in partition i so all the elements which were mapped to partition i, we look at the minimum one of those, and we say that, that is less than the minimum value which is mapped to partition i+1. So, we just state that for all i in 1 to k-1. So, we look at all pairs next to each other and that's going to remove those vague symmetries. So, we can now get back to our catapult problem. So, we have a number of spots that we're going to attack. Then we have an array which maps for each spot, the distance between those two spots. And then we have our number of clusters that we're going to build. And maximum separation that we're going to be able hit with one attack. And then our decisions of course, is for each spot, in which cluster does it appear? So, that's our mapping from the domain to the codomain. And then we've got this value symmetry removal constraint. Basically saying that all the things map to partition I. The minimum dose is less than the minimum map to partition i+1. But in the MiniZinc offers better facilities for value symmetry breaking. So, we can use a new global constraint which is exactly designed for various symmetry breaking. And this is called value precedes chain. So, it takes an array of fixed range of integers c and a variable array of integers x, these are the decision variables. And it's ensuring that the first occurrence of some value c[i] in this array of decisions x is before the first occurrence of value c[i+1] in the array x. So, in the general case this means we can use these values in any order that we want. But here we're just using them in the obvious order. So, we just ordering the cluster numbers from 1 to K and saying, okay the first occurrence of cluster 1, is before the first occurrence of cluster 2, before the first occurrence of cluster 3, etc. And that's in our shot array the decisions that we're making. So, we can use value precede chain. Again, the advantage of using the global constraint is we're telling this symmetry breaking constraint all this substructure into our global solver and it can do the best job it can with that constraint however it chooses to. So, we can go back and finish off our problem by saying, while the maximum distance within a cluster has to be less than the maximum separation. So, if we look for any two spots which are assigned to the same cluster, then the distance between them has to be less than equal to the maximum separation. And then we have to look at our objectives. And remember our objective was to say that we have to maximise the minimum distance between any two clusters into different clusters. So, how are we going to do this? We're going to look at any two spots where they're different. So, if they're in the same cluster, then we'll just assign that to be the maximum distance of any two spots, so they'll be far away. Otherwise we'll take the distance between them. So basically, we're going to look across the minimum for every pair of spots. And if they're in the same cluster, then we just take some arbitrary maximum distance, so that won't end up being the minimum. Otherwise, we'll take the distance between them. And so, we'll end up finding with this expression, the minimum distance between any two points which are not in the same cluster and that's were trying to maximise. So, we do this calculation upfront here to find the maximum distance between any two spots just to use it here in this if and else expression to make sure that we effectively ignored any two points which are in the same shot. And if we run this revised model with the data we're going to find out that we can cover our two things with two clusters with an objective of 3.606 A tablet game solution with two shots covering all important camps. If the coalition preferred a three-shot plan, since it creates more confusion for the rescue teams. So, Liu Bei pulled out the tablet again to find the most desirable plan. So, the catapult problem does not require exactly, but only at most k clusters, so we can have fewer than k clusters in effect as we saw in our solution. We only use two shots but we can easily add a constraint to enforce that every cluster has at least one member. So, we can go back and use the global cardinality constraint to force basically that every cluster has at least one member. So, we basically saying in every cluster there's a lower bound of 1 and upper bound of nSpot- k + 1. And that will force that there's at least one element in each cluster, right. So, that means each shot the number of things mapped to i, so the number of things that are mapped into that cluster, there's at least one. And of course because we know that the number of spots is this, and we know that there's K at different clusters, we can also put an upper bound small and N spot. We could just still leave it at N spot, that would be correct, but by putting N spot minus K plus one. We're putting a tighter bound and the global cardinality can do better reasoning. So, that would allows us to force that there are exactly K clusters used by adding just this one extra constraint. If we do that and solve our problem now, we can force all three clusters that we used and still get exactly the same answer. So, what if we don't know the number of clusters? Well, there's a simple solution to that, simple at least theoretically by saying, well we know that the maximum number of cluster we could ever use is the number of spots. So, we could just set K to the number of spots and then use our model unchanged. And then we would hope that it would pick the appropriate number, reducing the number of clusters. That could be very dangerous if this nSpots number is very big. It could make our problem very hard to solve. The catapult problem is an example of a pure partitioning problem, and we've looked at a few variations of it. The original problem was we have bounded set of clusters k. And then we added some constraints to force them to be a known number of clusters k. And we also talked about if we had an unknown number of clusters. But the important part of this problem was that the cluster numbers are irrelevant, so the numbers that we assign to each point didn't really matter. What mattered was which points were assigned to same cluster. So, this induced the value symmetry which we need to get rid of otherwise, we get many,many different ways of representing the same solution. And we remove that value symmetry using the value precede chain global constraint. Now clustering is a very well studied problem in computer science and there are lots of special purpose clustering methods out there. But when we look at semi-supervised clustering, where we want to add side constraints to just a pure clustering problem. Like we might want to add constraints saying that these two points must be in the same cluster, or this two points must be in different clusters. And you can see that's very easy to do with a discrete optimisation type of approach that we're doing here in MiniZinc. We can use that by adding equality constraints or disequality constraints to our model. If you want to put bounds on the size of our clusters, we can encode that using the global cardinality constraint, which we already did in fact, to force each of our clusters to at least size one. And you can imagine other kinds of constraints about the clusters, their sizes, their perimeters, what happens about the points inside them. And once we get to semi-supervised clustering then discrete optimisation becomes a very attractive technology to solve these kinds of problems.