The belief propagation algorithm when run over a general cluster graph is an iterative algorithm in which messages are defined in terms of previous messages. And we said before that when run over a general cluster graph the algorithm might not converge, and that even if it converges it might not get to the right answers. What we're going to talk about now are how bad these problems are and what are some of the tricks that we use in practice in order to improve the behavior of this algorithm in these two different regards. So let's get some intuition by looking back at our misconception example. This is not actually a mis-conception example. Its one where we modify some the specifically, the potential 51AB to make the behavior even more extreme by making these potentials larger so that there is a stronger push to agreement between A and B. So A and B, really, really prefer to agreement. And what happens in this example? we can see this beautiful oscillatory behavior of the belief propagation algorithm, so the X axis is iterations of BP, and this is some notion of distance to the true distribution, to the true marginals. And why do you have such strong oscillations? Let's look at what these potentials are doing. So this potential over here pushes A and B to agreement. And so as you pass this message A and B one to agree, A and B also want to agree. B and C want to agree, but C and B really want to disagree. And so, as messages are passed, they're, you're getting conflicting messages from both sides. D, for example, on the C side, C keeps pulling at it to agree with it, sorry disagree with it. On the other hand, when you go all the way around the loop. C is actually urging D to agree with it. And so, from the one side you're getting a pull for C and D to be the same, and from the other side you're getting a pull for them to be different. And this sort of this conflict over the cycle over the loop causes this oscillatory behavior as messages are passed from one side or the other. This type of configuration of specifically. height loops, strong potentials, and conflicting directions is probably the single worst example for the single worst scenario for belief propagation. This is the case where it's going to do the worst in terms of both convergence and in terms of the accuracy of the results that are obtained. Now in this example, ultimately you actually do get convergence. You can see it, takes about 300 iterations but ultimately you get convergence, 300 is a lot for graph this size. Here is one that up to 500 you didn't get convergence at all, maybe if you ran it for 10,000 it might converge, but it's hard to say. And, so how do we improve the convergence as well as the accuracy property of the network? So first let's look at what not to do. Here is the most important thing not to do. Which is a variance of belief propagation called synchronous belief propagation. In synchronous belief propagation, which was actually one of the most commonly used variants at the very beginning of of the use of, of the BP algorithm. All messages are updated in parallel. So all of the processors wake up. So, all of the cle, all the clusters wake up. They look at all of their incoming messages. And they compute all of the outgoing messages, all at once. Now, this is a great algorithm, from the perspective of simplicity of implementation, and also from the ability to parallelize, because you could assign a processor to each of the cluster, and they're all working in parallel and there's no dependencies between them. Unfortunately, synchronous belief propagation is actually not a very good algorithm. And so what you see here is the number of messages that are have converged as a function of the amount of time spent. This is an icing grid with certain parameters, it doesn't matter. And you can see that you know, there is a certain improvement over time and then it kind of asymptotes, at the certain number of messages that have converged. By contrast, let's compare that behavior to asynchronous belief propagation, where messages are updated one at a time. So, this one and that one. Now notice that this algorithm is poorly specified because I didn't tell you what order we should do the updates in and we'll come back to that in a minute. But by, even by the simple virtue of passing messages in an asynchronous way, the behavior improves both in terms of how quickly we get messages to converge, and in terms of the number of messages that are converged. Now, this is not a particularly good message passing order. Here's a much better message passing order. It takes a little bit longer to get certain messages to converge. It's trying to make sure that everything is converging. But notice that eventually and in, in not that long of a time it actually converges to 100% convergence rate, okay? So, here's some important observation regarding this. First of all, convergence is a local property and belief propagation. Some messages converge quite quickly, others might never converge. And when you have some messages that after you run the algorithm for certain amount of time, don't converge, one often simply just stop the algorithm and says you know, this is what we have and we're done. Synchronous belief propagation has considerably worse convergence properties than asynchronous belief propagation. Which is why pretty much, at this point very few people actually ever use synchronous belief propagation. And, if we're using asynchronous belief propagation, the order in which messages are passed makes a difference both to the rate of convergence, but even more surprisingly perhaps, to the actual extent to which messages are converged ever converged. And so how do we pick the message passing order? There's two there's several different scheduling algorithms. I'm going to present two of the more popular ones. The first called, tree reparametrization or TRP, for short. And what it does is, it picks a tree, and then passes messages in that tree in the same way that you would do in a creek tree algorithm. To sort of calibrate that tree keeping all other messages fixed. So for example, we might start by calibrating the red tree, which means I pass messages this way and then back in the other direction, keeping all other messages fixed. Now, I pick a different tree. I'm going to pick the blue tree. Here is a blue tree. Notice that I can't close the loop because otherwise it wouldn't be a tree. So I can't go all the way back to the [INAUDIBLE]. And now I calibrate in the other direction. And I continue to do that by taking a tree and then and then running it, and then calibrating it, and the only constraints that I need to satisfy is first of all, that my trees cover all edges. So that's a constraint, because otherwise I'm going to miscommunicate, miss a certain message that needs to be passed. And the second is not so much of a constraint, but rather it tends to improve performance if the trees are larger. Which means it's good to pick spanning trees. Which span as much of the, of the graphs as possible without containing a loop. So, standing trees are good. That's one message scheduling algorithm. The second message scheduling algorithm is something that's called the dual belief propagation and there's actually several variance of that now. And what it does is it tries to look for good messages, messages that are high value messages. So it looks for two clusters whose beliefs disagree a lot. And if they disagree that means that if I pass that message, that's going to potentially have a large impact on the receiving clique cluster rather. And so I'm going to, look I'm going to keep a priority queue messa- of, of edges. And I'm going to pick the messages from the top of the queue, based on how much I think they're going to effect, how much of an effect they're going to have. Which is some kind of huberistic motion, potentially of a disagreement between the two adjacent clusters. The other big important Trick that people use to fool the convergence of the belief propagation algorithm is what's called smoothing or sometimes damping of messages. And this is a general trick that's often used to reduce oscillations in dynamic systems that are based on these kinds of fixed point equations where the left hand side is defined in terms of the right hand side. So here we define delta IJ in the original belief propagation algorithm as a function of other deltas. And we've seen that can give rise to oscillatory behavior. This is just the standard belief propagation message. And what we're going to do now is instead of that I'm going to do a kind of smooth version. So I'm not going to let the message change too drastically. I'm going to have a weighted average where the weight is lambda of the new message, which is this thing and the old message. And that turns out to again damping oscillations in the system and increase the chances of it converging. So let's look at some example behaviors so here we see synchronous belief propagation in red, over here. And this is the percentage or fraction of message converged. So, one would be perfect convergence. We can see that synchronous plateaus at about 25, 20 to 25% of messages converged. Which is pathetic enough to be pretty much useless in practice. If only 20% of your messages converged. Then really the remaining messages don't really mean very much. without smoothing is the green line, this one. And we can see that this is asynchronous, asynchronous without smoothing. We can see that, even without smoothing the asynchronous algorithm achieves considerably better performance than the synchronous algorithm. And it even shows synchronous without smoothing, because it would be even worse. And then finally, the blue line is asynchronous with smoothing. And we can see that it achieves perfect convergence at some point. And, you can look at behaviors of individual messages, or individual marginals, rather, individual beliefs. So here, this is still an icing grid, by the way. So icing grid, which is sort of standard set up for trying out convergence sort of, of various approximate inference algorithm. And so what you see here, the line here, the black line is the proof marginal. And you can see that synchronous belief propagation basically flip flops around it in, indefinitely, whereas asynchronous belief propagation converges pretty quickly and very close to the right answers. Not always, this is a different variable in the same model. The black line, again, is the true. And here, we can see that, sure enough, asynchronous does improve convergence. But convergence is still to an answer that's not quite right. And you see that behavior you see both behaviors in real networks. And, in practice depending on all of the, all of the factors that I mentioned at the very beginning. How tight the loops are. How strong or peaked the potentials. those are going to determine, how many of the factors have this oscillatory behavior, and how wrong the answers are. So as we, so just summarizing there's two main tricks for achieving BP convergence damping or smoothing and intelligent message passing. convergence is often easier to obtain using these tricks than correctness and convergence unfortunately doesn't guarantee correctness. And the bad cases, the ones that negatively impact both of these are strong potentials that are pulling you in different directions and tight loops. And we're not going to have time to go into all of these into all of these topics, but there's some new algorithms that actually have considerably better behavior both in terms of convergence and in terms of the accuracy of the results. And that is based on a view of inference as optimizing some distance to the true distribution. And there's some really cool ideas as well as math in this but much of this is outside the scope of this class.