Princess Iron Fan agreed to lend Wu kong and Zhuge Liang the fan. Wu kong headed back to the flaming mountain and tried to blow out the fire, but he found that his attempt was futile. The princess, in fact, had given him a fake fan. To seek revenge, Wu kong decided to disguise himself as the princess's husband, Bull Demon King, who also happened to be his sworn brother. And to convince the princess to give him the fan, whilst in disguise, Wu kong needed to bring along the Bull Demon King's beast, the Water-Repelling Golden Crystal Beast. Wu kong knew that the Bull Demon King often enjoyed wine, and left his beast at the lowest point of the mountain, the Moyun cave. The area was cloudy throughout the year, and lightning was constantly striking. Zhuge Liang observed that there were three thunder and lightning zones in the area. To reach the Moyun Cave and fetch the beast, they had to pass through these zones. However, Zhuge Liang found the lightning unbearable. Wu kong suggested they use a golden cudgel to protect themselves when they were passing through. Although, as a consequence of using this cudgel, it caused changes to the landscape when redirecting the lightning strikes to the earth. In order to find the Moyun Cave in the changing landscape, Zhuge Liang summoned the dragon of dimensions for assistance from the. >> Sun Wu kong has got the fan of Princess Iron Fan, and when he goes to test it he finds actually it's a fake. She's tricked him. So he thinks, well, he's got a lot of magic powers, one good trick deserves another. So he will convert himself into the Bull Demon King, which is Princess Iron Fan's husband. So there's only one problem with this is the Bull Demon King is always riding his beast, a very unique beast. And he can't pretend to be two things at once. So in other words, he's going to have to get this beast from somewhere. Now, it happens that at this moment, the Bull Demon King has gone to Moyun Cave for some activities. And what he typically does when he's there is he goes inside and drinks, and he leaves his beast unattended outside. So it's a perfect time for Sun Wu kong to steal the beast and then he can pretend to be the Bull Demon King. So he needs to find Moyun cave in this complicated area. And now there's a new thing, which is there are zones of thunder in this area. And so he's going to have to avoid the zones of thunder. So there's three different zones of thunder. There's the ones here with the absolute value of x is the absolute value of y. The ones in the corners where the absolute value of x + y is going to be 7. And this area in the middle when x times y's absolute value is less than or equal to 1. All right, so the strength of the thunder differs in these different zones. So if he's on one of these lines here, the absolute value of x equals absolute value of y, then if he's on one of these lines, the strength of the thunder is 1. If he's not on one of these lines, of course there's no thunder and the strength of the thunder is 0 for that thunder zone. So if you look at this second thunder zone at the corners here, if the absolute value of x + y is less than or equal to 6, he's not in any of these zones, then the strength of the thunder is 0. If the absolute value adds up, those two absolute values add up to 7, then the strength of the thunder is 1. If they add up to 8, so he's right in one of the corners here, then they add up to 2. We look at the third thunder zone. So if the x times y, its absolute value is greater than or equal to 2, he's not in the thunder zone. And so the strength of the thunder is 0 for this particular zone. If the absolute of x times y is 1, then the strength of the thunder is 1, and if it's 0, then it's 2. So each of these zones has different strengths of thunder. Now, thunder and lightning's not going to affect Sun Wu kong but he has to protect Zhuge Liang. And so what he's going to to do is he's going to put his magic staff up and act as a lightning bolt. And then because the power of the lightning will strike the ground, actually it's going to cause it to increase. So if he is in a thunder zone with a strength of 1, then the ground will raise by a height of 1. If he's in a thunder zone with a strength of 2, it'll raise by 2. All right, so he's actually going to change the landscape by where this lightning strikes. So this is our beast search problem. We're searching for Moyun Cave. That's where the Bull King has left his beast unattended. We have to avoid the thunder zones, all right, ideally, so these are the constraints of it. We know that Moyun Cave is not in any of the thunder zones, so we're trying to find the global minima in this complex and non-convex landscape, all right? So these constraints, the opposites of the thunder zones. So let's have a look at the landscape. So we see it's non-convex. It's got lots of local minima. That's what it looks like. And if we look from above, we can see lots of local minima here scattered around. And now we can look at the constraints. So we have this, the absolute value is not equal to the absolute value of x is not equal to the absolute value of y. So it's this cross pattern we've seen before. If we sum them up, their absolute values have to be less than 7, which basically wipes out the corners. And then if we multiply them together, their absolute value is greater than 1, which wipes out this bit in the middle. And notice, for the first time, if we look at the intersection of this we get these sort of non-connected areas. So we've been using the implicit constraint search method for our previous examples. We've just avoided entering any spot which violated this constraint. But if we do that now, so if we stay here, we'll find that there's no way of getting out of this corner. Because we can't cross over here, if we're using our simple neighborhood here where I can change each variable's value by at most 1. So this is a case where implicit constraint method where I always keep the constraints satisfied won't be able to move me around the search space sufficiently, because it's actually disconnected by the constraints. Once I take into account that neighborhood idea, of course, I'll have a neighborhood that's bigger, then I could be able to jump over these constraint camps. All right, so we're talking about penalty methods. So implicit constraints aren't going to work. Penalty methods, remember, are the only general approach that we can use to handle constraints in local search. So we're treating constraints as penalties. And of course, we saw before that the problem with this is if we have too high a penalty, then we get stuck in local minima, we can't move. And if it's too low then we just never satisfy the constraints. We just spend our time finding beautiful local minima, but they don't satisfy the constraints. So we want to know how we can set these penalties appropriately. And the way we're going to do this is with what's called Lagrange multipliers. And the idea is we don't set them, but we're going to learn them as we search. As we search, we're going to learn which constraints are the most difficult to satisfy, and their penalties are going to be set high, and other ones are going to be left alone. So in Lagrange multipliers, we're solving our constrained optimization function here. And what we're going to do is create a combined objective function. So this is this lambda function here, which takes both x and a new class of variables, lambdas. And really it's just defined as this. We take the original objective function and then we have lambda 1 times p1 of x squared. p1 is a penalty function, and lambda 2 times p2, lambda n times pn. So each of these pi(x)s is a penalty function for the constraint ci(x). And this lambda i is what we call the Lagrange multiplier. And it'll be always non-negative in our case. And this is basically saying, how important is it to satisfy this constraint? And as we find the constraints hard to satisfy, we'll increase that Lagrange multiplier. So we'll take more account of those constraints. All right, so the penalty function is how we represent constraints. And we need the penalty function to basically have, if the constraint holds, then the penalty function gives 0. And if it's violated, then it gives a positive number, right. There are examples where we can have penalties with negative numbers, but that's only for very specific kinds of Lagrange optimization. So in our case, the penalty is always positive for non-solutions. Let's look at some choices of penalty functions. So we've talked about this already. The obvious thing to do is just say, well, if the constraint holds, then we get a penalty of 0, and otherwise, we get 1. So the problem with this is that we get a very rugged landscape. And we don't really take into account by how much we're violating the constraint. We don't know if we're close to satisfying the constraint or far away. So it's better off measuring some degree of violation. So of course, the penalty is 0 if the constraint holds. But then the penalty should be some kind of measure of the distance of the current place I am to the nearest solution of the constraint. And that means as we move around, if we move closer to a solution, that's rewarded, and so we're more likely to be able to find a solution. Gives us a more fine grain view of what we're doing in the search. So let's have a look at some example penalty functions. So if we're just trying to see if some expression equals 0, then we just take the absolute value, and that'll have the right behavior. Either this expression adds up to 0, in which case, this thing will be 0, or is a non-zero value and we would get a positive value. If we're looking to satisfy e less than or equal to 0, we can just take the max of e and 0. And obviously, if e is taking the value of 4, we're 4 away from satisfying that constraint. And we get a penalty of 4. So that makes perfect sense. For not equals, we don't have a very expressive way of thinking about it. So if e takes a value of 0, so it violates the constraint, then we get a penalty of 1. Otherwise, we get 0. And then we can combine these penalty functions. So if I have a disjunctive expression, c1 or c2(x), then basically, I take the two penalty functions and take their minimum value. Because, of course, if I have to satisfy this disjunction, if I'm close to satisfying one of them, then that's the distance I'd use to satisfy the whole disjunction. So the minimum is the right way of combining them. For conjunction, I'll take the penalty function by adding up the two penalty functions. because you can think about it as if I need to satisfy both of them, now I'm going to have to move at least this far to satisfy the first one and at least this far to satisfy the second one. And so I'm that far away, the sum of them, from a solution. And negation is again a bit like not equals, very weak, right? If the penalty function returns 0 for this constraint that I'm negating, then I get 1, otherwise, it returns 0. All right, what about global constraints? Because, of course, we're used to using global constraints in this course for everything. How do we work out penalties for them? So we can do this by decomposition. So we can think about alldifferent, which is this alldifferent is really equivalent to all these inequalities. And we can just work out the penalties for each of these inequalities. And that gives us the penalty for alldifferent, so that would be fine. We worked out the rule for conjunction. So we would basically add up the penalties for each of these inequalities separately. Or we can build a direct function. So in this case, one for alldifferent is the sum of all possible values that the variables can take. And you sum up the number of times that value is taken. And basically, if you take it more than one time, you get some penalty for that number, right? So we're adding up the number of times a value of v is used more than once. And notice this actually gives a different value than this one, right? These are not the same penalty function, but this one, of course, is easier to compute. Here if we do this decomposition, we're going to have to check any squared inequalities. Here we just sum up some numbers. All right, what about for cumulative, right? We can define a special penalty function for that as well. And we can think about, well, how close are we to satisfying a cumulative constraint? Well, we're not satisfying it when we overuse a resource. So basically, we can just add up the overuses of resources. And so if we had this timetable in front of us, we'd say, well, actually, we've only got this amount of resources. And obviously, overusing by 1 here in the time between 4 and 5, and overusing by 1 here between the time of 5 and 6. So the total overuse is 2, and that would be our penalty. So I can build a penalty function which does that by basically creating a timetable and counting the number of overuses. So if we think about the constraints in the problem we're trying to solve, then this absolute value of x not equal to absolute value of y is, basically, we take 1- absolute of x absolute value of y, and the max of those and 0, which basically gives us, if this thing holds, then we get 0. If they're equal, then we get 1. And then we can apply the rules for inequalities to get these two, right, the rules for an expression less than or equal to 0. With a little bit of fiddling around, we get this max expression. And you can see that that agrees with what we saw at the beginning, where we're just talking about the strength of the thunder. So if the absolute value of y and added to the absolute value of x is less than 7, we get 0 because the constraint is satisfied. If it equals 7, we get 1. And if it's greater than 7, well, we get 2 in this case because the only place it can be greater than 7 is when it's 8. And similarly, this x times y absolute value greater than 1 gives us this penalty function. And there's only three possible zones for the particular values that we can take in our problem. All right, so what we're going to try and do, we're trying solve, remember, this problem. We're trying to find minimize f(x) subject to this set of constraints. So what we want to do find a point that satisfies all the constraints, obviously, and where this f(x) value is minimum. So that's a constrained global minima. That's what we're trying to do. What we're going to do is relax out our question a bit and look for constrained local minima. So a constrained local minima is something where all the constraints hold. So at least we've satisfied all the constraints. And if we look at all of our neighbors, then their value f is higher than ours, right? So we're at minimum with respect to our neighbors. So clearly, if we can find a constrained local minima, then the constrained global minima is one of these, and so this is a way of trying to find a constrained global minima. So the way we're going to do this is what we call discrete Lagrange multiplier search. So it's a discrete version of Lagrange multiplier type searches. So simple, we start with a random valuation and we'll start with all of our Lagrange multipliers set to 0. And then we just record where we are in terms of the current solution and the current Lagrange multipliers. Then we just do a steepest_descent. But notice what we're using is this Lagrangian function. Not the original objective here, but this Lagrangian function, which takes into account the Lagrange multipliers times the penalties. And once we finish that, we're going to update the Lagrange multipliers. So any constraint which is violated will cause that Lagrange multiplier to be updated to be stronger. So this is basically recording that these contraints we weren't able to solve, so maybe you should try harder next time to solve them. And we just keep going around that loop until we reach a fixed point. So let's have a look at DLM in action. [COUGH] So let's have a look at DLM in action. So here is our search space. And here we've marked with a cross all of the positions which violate one of the constraints. All right, remember we have our three different constraints, these ones. But the search space we can see is disconnected. So when I'm in this search space, by moving either diagonally or orthogonally, I can't actually reach down to this search space. All right, so that's one of the reasons we're going to use this penalty method, so that we can move around the search space freely. So now here, we've color coded all of the spaces in the search space with the value of F and which constraints are satisfied. So the green spots like this one here satisfied all the constraints. So they're actually places we'd like to end up in. The yellow spots violate this absolute value of X is not equal to the absolute value of Y constraint. The red spots violate this, don't be in the corners constraint. So these ones here and the blue spots violate the X and Y absolute value less than 1 constraint, so we can see them in the middle. And then we've color coded spots, which violate two constraints by the colors of each of the two constraints they violate. None of the examples here violate all three. So for example, this constraint, this spot here violates this constraint here and this constraint here. And the function we're trying to optimize is just the original function to begin with, because we've set all of the Lagrange multipliers to 0. So if you think about it this Lagrangian function l is exactly just the original function to begin with. And so we've put the objective value in all of these slots here and we've marked all of the local minima in this plot at the moment. Of course, we don't know this. We're going to discover this in the actual search. But if we had a global view of everything that was going on, then we can actually see all the local minima at the moment with this current objective function that we're trying to satisfy. So let's assume that alpha is 1 and we'll start over here in position 2, 2. So remember what we're doing. We're changing one or two variables's values by 1, so we look at our neighbors and we choose the best downhill move. We just did that. This is all fine. This is just normal search. Now I'm at this point here, I just look at our neighbors. Again, I choose the best downhill move. I get to here. Now look at all the neighbors, choose the best downhill move. That gets me here. Now, I look at all my neighbors and none of them are better. So we've reached the local minima. So what do we do now? Now we look and we say, okay, we've reached the local minima. Let's look at which constraints are satisfied. So the first constraint is not satisfied. There's a penalty of 1. The second constraint is satisfied and the third constraint is not satisfied is another penalty of 1. So, the rule for updating the Lagrange multiplier says, we're going to add 1 to each of these 2 Lagrange multipliers. And so that changes this Lagrangian function that we're trying to optimize to add copies of those penalty function team. And so that's going to change the landscape. So actually the landscape that we're trying to search over has changed, things have moved around. Once we do that, now we come back to this point here. We reevaluate our neighbors and we find that we're not in the local minima anymore. We've basically pushed the landscape up where we are. And now, we've got another local minima here and we look at its neighbors. We go to there. We look at it's neighbors. And now, we find a constrained local minima. So notice it's better than all its neighbors, but this also doesn't violate any constraints. So this is a constrained local minima. It doesn't matter what we do now, changing Lagrange multipliers will not change the fact that this is a local minimum. So what is actually happening in this DLM search? We're performing a steepest descent search. But rather on the original objective F, we're using this combined objective L which combines the constraint violations with the original objective. And what happens is when we reach a local minima, then some constrains might be violated. And really, that's evidence that these constrains are hard to solve at least at the moment they had to solve. because we're not solving them. And so they should be more important. We should be giving them higher priority to satisfy them and the way we do this is we update the Lagrange multiplier for each of these constraints, and we updated by an amount which is proportional to the amount of their violation multiplied by this multiplier alpha. So what happens? How does this stop? So we stop when we didn't change the point that we're at and we didn't change the Lagrange multipliers. So obviously, if we don't change the point we're at, then we must be the local minima. In fact, any time we come around that loop, d* is a local minima. Because we've just run local search. And the other thing is if lambdas didn't change, then it means that all the penalty functions must be 0. If they're not all 0, then we would have updated the lambdas. And so basically, that means that this whole procedure will terminate when we've found a constrained local minima which is also called a saddle point. So we've definitely found a local minimum for f, which all the constraints are satisfied. So let's have a look at DLM search once more in action. So we're having the same multiplier. We're just going to start in a new point -4,-4. So if we start off here, we look at our neighbors. We move to the nearest neighbor. We look here and we're at a local minima. None of our neighbors is better and the first constraint is violated. So we're going to update the Lagrange multiplier here and that's going to change the Lagrange function we're optimizing. Now, that also changes the landscape. Basically, anything to do with this first constraint has actually been penalized and that may change where some of the local mineral are going to get had so far. Then we again, look at our neighbors hoping. because we've been pushed up that we're now no longer a local minimum. As it turns out, we're still at our local minimum and it's still the same constraint that's violated which is unsurprising since we haven't moved. So we're going to update our Lagrange multiplier again. We've got a violation of 1, so it gets updated by another 1. And now, our Lagrangian function is 2 times the penalty of that. So we're basically saying this constraint is hard to solve, we should make it more important. And again, the whole space changes at least the spaces had some yellow in them. Because their penalty was changed. Now, we look at our neighbors again and we're still at a local minima. Update the Lagrange multiplies and update. Well, this is implicitly what's happening. We update all of the people, which have a yellow spot on them. Their penalty phase is going to change. because basically, we've changed how we're evaluating each of these points. Of course, in the actual search, we're not going to see this. We don't care about this. It's just that we're going to use this function to evaluate points from now on. We again, evaluate our neighbors and we're still at the local minimum. We go through the cycle again. We update our neighbors. We're still local minimum. We go through again. You can see that we're just not getting out of this local minimum, which was very deep. We've updated our local minimum. We're still at local minima after updating seven times now. So we're now had seven violations we've been sitting here. And finally, we're not at a local minimum anymore. So now we can move to somewhere else, thankfully. We go to there and we find we're at a local minimum. But at least, we've got a different constraint being violated this time. So we're violating the second constraint. And so again, update to Lagrange multiplier for the second constraint. And obviously, that becomes part of this Lagrange function that we're optimizing and that's going to change the whole landscape a little bit. It might change with some of the local minima. Now we're looking at neighbors and we have a neighbor which is us. Unfortunately, it's back to where we were before and we keep going. Of course, we look at our neighbors again and damn it we're back at a local minima. So back to where we were before and back at a local minima and we're still violating this first constraint. Right, so let's update the Lagrange multipliers again. So that goes up to eight, it's going to change the values of positions on this yellow violations place. And we'll evaluate. And okay, now we've got a new neighbor with not a local minima anymore, and we're going to go down to that neighbor that we were at before. And if we value it, it's a local minimum. Right, so we're going to go through this process again. We have the second constraint is violated. We updated Lagrange multiplier, so that's updating our Lagrangian function. And that's also going to change all of the values in the red squares. And then we look at our local neighborhood and we find this is new sources, better. So we'll move there, and it's a local minimum, again! All right, we'll update the Lagrange multiplier once more. And that'll update everything, and we'll look at our neighbors. And finally we find a place which is not, [LAUGH] Not this person we've been doing, not these two. So we've finally moved out of that. And we find that this also has a better local neighbor, and now we're at a local minimum. And this time the difference is that all the constraints are satisfied. And in fact, we found the global minimum. So eventually, you can see it took us a long time to get out of this area. But eventually we got out because the Lagrange multipliers eventually made these areas not attractive anymore, even though they looked attractive in the the original objective function. So let's have a thought about what we've done. So you can see the different initial points can matter strongly in this Discrete Lagrange Multiplier method, right? So it's just like other local search methods where we start changes where we end up. And we can only find constrained local minima, we've got no guarantee that we find the global minima. So we need to think about combining this with other methods like restart, so that if we find a constrained local minima, then we can go and look for another one. And we haven't talked about cycling and plateaus. So If we're on a plateau which violates a constraint, we're going to keep making those positions less, and less attractive. And so eventually, we'll move out of them. But if we're in a plateau which has no constraint violations, then those plateaus are local minima, and we just have all the same problems with plateaus that we've seen before. But we can combine this with tabu lists as one way of handling plateaus. So Lagrange multipliers have lots of considerations to use, so how fast should we update the Lagrange multipliers? So that's this value of alpha. How often should we update Lagrange multipliers? So we have shown an example where we basically only update them at local minima. But you can also update them more frequently, you can update them every step in the local search. So that's a possibility. And another question is, how do you split constraints? So if you split constraints to get more multipliers, then you have this disadvantage, you've go to keep track of more things. You've got more constraints that you gotta evaluate, and more multipliers to keep track of. But the advantage is you have a less rugged search space. So if I have a single multiplier for alldifferent, then if I keep on violating alldifferent, then it seems like it's very important, and it's multiplier would get bigger, and bigger. But it might be that I was violating alldifferent because of some inequalities early on in the search, and later on, I'm violating for different inequalities. And so by having a multiplier for each inequality, I've got a finer-grained tracking of which constraints are difficult to solve. Another thing is, should we reduce multipliers, ever? Typically we end up doing this just for, at least for arithmetic stability. And another reason is maybe constraints which were really important and hard to solve early on in the search space have now become easy, and we don't really have to worry about them. And actually, they just make it harder to find the constraint local minima that we're looking for. Right, in summary, thIs Discrete Lagrange Multiplier method is a principled way to adjust penalties. So this is really important because we've already seen that this penalty-based approach is really the only final way that a local search method can handle constraints, arbitrary constraints. And so this DLM search will find a constrained local minima, given an infinite amount of time. But it'll run around, and eventually it will do that. It's still not a complete answer, right? We need to know how fast to adjust the multipliers? We need to think about how many multipliers we need, or how do we think about multipliers for our constraints? When to adjust the multipliers? And when we should possibly make them get smaller? And we can combine them with other methods to increase exploration and avoid cycling.