0:00

We described the belief propagation algorithm as passing message over cluster

graph but we left unspecified how that cluster graph would be constructed and

what properties it needs to satisfy in order to support reasonable message

passing. So let's remind ourselves what cluster

graphs are. A cluster graph is an undirected graph

whose nodes are clusters that involve subsets of variables and edges involve a

subset SIJ, which is a subset of the two clusters on

the twin points. What are some properties that the cluster

graph has to satisfy? The first property is the one called

family preservation an obvious value. where remember that we need, given a set

of factors phi, we need to be able to assign each phi K to some cluster,

C alpha K, such that the fact, such that the cluster can accommodate.

The scope. So can accommodate 5K.

oops accommodate.. Well, in order for the cluster graph to

allow this to be done, we need to have an appropriate cluster in the cluster graph.

So, this imposes a constraint on the cluster graph that says that for every

factor phi K in my set of factors phi, there exists some cluster CI that, such

that CI accommodates phi K, which means that you can put phi k inside

this cluster. That's the family preservation property.

The second property is a little bit trickier to understand.

It's called the Running Intersection Property.

The Running Intersection Property, let's first read the definition and then

understand what it says. It says that let's assume that we have a

pair of clusters CI and CJ and a variable X that sub, that belongs to both of them.

So for example we might have the variable X sitting here in C7, and we might have

the variable X sitting here, in C5. This property says that there exists a,

exists a unique path between Ci and Cj for which all clusters and subsets along

the path contain x. What does that mean?

It means that for example should I choose this to be my unique path.

It means that X needs to sit here, here and here.

So, there is a connecting path that involves X and that path is along the

entire route and that path is unique. So let's try and understand the intuition

between both sides of this definition, the existence and the uniqueness.

Imagine that I, let's do the existence first.

And imagine that for whatever reason I decide that there is not going to be a

path over here. So there is no way for C7 to communicate

to C3 about the variable X. Well, that means that we now have these

two separate, isolated communities, each of which has some information about X and

they can never talk to each other about X.

So they're never going to get to agree about X, there's never going to be any

information transfer, of these two pieces of information.

Well, so that's not very good, which is why we need, that path to exist.

Okay, so that's the exists part. What about the unique part?

The unique part is a little bit trickier to understand but let's but let's try and

provide some intuition for it. Let's imagine that I have two paths that

involved X. Let's so now let's think about this

message passing algorithm. So C3 can send the message.

C5 with information about X. For example, I think X is taking the value one, I have

a, I have a factor that suggests that X takes the value 1.

Well, C5, you know, integrates that with it's own information and sends it on to

C2. Which sends it back to C3.

And now C3 hears from C2 that ooh, X needs to take the value one.

Huh? That reinforces its beliefs that X needs

to take the value one and so the probability goes up.

It now goes back and sends that information on to C5, which sends it on

to C2, which sends it back to C3, and then once again the probability in X

taking the value 1 goes up. And so we have this sort of

self-reinforcing loop that's going to give rise to very extreme and very skewed

probabilities in many examples. And so a way of avoiding that is to or at

least reducing that risk is to is to prevent these kinds of feedback loops.

Now, it's important to realize that by preventing these loops that only reduces

opposed to eliminates the problem. And specifically this is kind of a little

bit of a digression but it's important to know,

is that, imagine for example that we have X, and Y, that are very strongly

correlated. So here, we have for example X and Y and

here we have a path. That involves Y.

Okay? So now what's going to happen is that C3

sends information to C5 about X. C5, translates that to information about

Y, Y should take the value one. That information about Y goes back to C3

and increases the probability in X taking the value one.

Now this is a little bit of a forward looking hint about some of the issues

that we'll see with belief propagation later on, which is that belief

propagation does very poorly. When we have strong correlations.

6:33

And that's because of these feedback loops.

The more skewed the probabilities in your graphical model, the harder time belief

propagation has in terms of the results that it gets.

So, with that digression, aside, let's go back to the running interception

property, and let's provide an alternative definition of the running

interception property, just to give ourselves some additional intuition.

The running interception property is equivalent to saying that for any X.

The set of clusters and subsets that contain X form a tree, so for example if

we have X here, here, here, here, here and here.

We can see that the set of a cluster is in sub sets that contain x form a tree.

It has to be connected, because of the existence of the path.

And it can't be, a non tree. Because, because that would give us two

different paths. And so that's a different view of this.

And so you can think of each variable inducing its own little tree.

across which information about that variable flows in the graph Now, let's go

back with that definition and consider some cluster graphs that we might adopt

for for this example that we've seen before.

So, here we have our five clusters so let's check whether it satisfies the

running interception property. We've already done family preservation

for a particular set of factors. So, let's consider for example the

variable, B, and we can see that we have B here, here, here, here,

here, here and here, and we can see that, that is a tree.

Here's my tree. Note, very carefully, that B isn't here.

If B were here that wouldn't satisfy the running intersection property.

That would be an illegal cluster graph, that's actually on the next, on a

subsequent slide. Here is an illegal cluster graph.

It violates the running intersection property Y.

Because here is B, and here is a bunch of other B's and there's no way to connect

cluster two to any of the others. So this one violates the existance.

9:08

This one, which we just talked about, violates the uniqueness because here we

have the, all of the clusters and subsets that involve b, and there's this nice

little loop over here that has two paths between say, cluster one and cluster

four. If we wanted to take the cluster graph

and still allow communication between B and C, so we want to have, for example we

want cluster one and cluster two to be able to transmit to each other

information about how B and C are correlated with each other, which they

can't do in this graph over here. So, one way to do that is to we have, we

have eliminated in this case, this edge that we have her that involves B, and now

once again we have B. Being a tree in the graph now focused on

be here but its not difficult to convince yourself that other that other nodes also

satisfied that we satisfy the running intersection property also with respect

to other variables so just as an example here is a set of clusters and subsets

involved in D and too is a tree and here is the one's involving E and that too is

a tree and so on. So and running intersection property

needs to hold for every for every one of the variables.

How do we construct a cluster graph that has, that has a desired properties.

One very simple and in some ways degenerate but still because of its

simplicity very often used is a cluster graph called the beta cluster graph.

And that's a term from statistical physics where people use this kind of

approximation to energy functions in in some, in some calculations, in, in

Statistical Physics. So here, we have in the beta cluster

graph, we have two types of clusters. We have big clusters and little clusters.

The big clusters, these are the big clusters correspond, to factors in phi.

12:09

So if we consider the set of factors that we had before, we can now produce a

different cluster graph then one that we had previously.

This is a beta cluster graph. Notice that we have these big clusters.

These are these four, I'm sorry, these five.

And we have these, little clusters corresponding to A, B, C, D, E, and f.

And we can see that we have an edge for example, between the ABC cluster and the

A cluster, the B cluster, and the C cluster.

And that's how messages are passed. And you can see how this graph is

degenerate in the sense that it only allows information about Singleton's to

be passed. It loses, in every message passing step,

any information about the correlation between variables.

But, nevertheless, it's simple to construct and it's guaranteed to satisfy

the running interception property. Why is that?

So let's consider for example, all of the factors that involve the variable, umm,

D. So here is, what are the clusters that

involve the variable D? Well, there's this one.

And then there's this, that, that. Those are the set-, where we have the

subsets, which I didn't mark. And then these ones.

And notice that it's a tree by definition.

Because d doesn't appear on any other subset, except these ones.

And so the trees is a star. It's the variable cluster plus the big

factors that contain the variable. So d, and, in this case, the three

clusters that contain it. Look to summarize.

We have, we know, we looked at the properties of the coaster graph, we see

there is two of them that it needs to satisfy, the first is family preservation

which allows the set of factors to be encoded and the second is the running

intersection property which serves two purposes, the first is to connect all

information of any single variables that it can all be transmitted through the

graph but without creating type feedback loops that are going to create the self

reinforcement and highly inaccurate answers.

And the beta cluster graph is often the first default that people use.

Because it's easy to define. And it's guaranteed to be correct.

But richer cluster graph structures of the type that we talked about can offer

very different and sometimes significantly better trade-offs.

With respect to, on the one hand, the computational cost.

And on the other hand. which of course, as you start increasing

the amount of, of the sizes of the messages that are passed.

that can grow more expensive. But at the same time, allow us to

improve. dip the preservation of dependencies as

messages are passed in the graph so that more information is actually kept and not

lost in this message passing process.