So how do you find a good elimination ordering? Fortunately simple heuristics

actually do pretty well. So one standard way of finding good

illumination ordering is to simply do a greedy search eliminating one variable at

a time. Where at each point in the elimination,

you've used some kind of heuristic cost function to decide which variable your

going to eliminate next. And there's some obvious cost functions

to use and it turns out that surprisingly they work pretty well.

So, for example, one cost function is what's called min-neighbors.

Which is to pick the node that has the minimal number of neighbors in the

current graph. So you want to pick the variable that's

connected to the fewest friends so it has the smallest party or produces um., the

smallest factor. So this corresponds to the smallest

factor or the one whose cardinality is smallest,

enters number of variables. Min-weight accounts for the fact that

sometimes you have variables that have 100 values and others that have two

values, and if you just count variables, you're going to ignore that distinction.

So min-weight computes the weight or the total number of values in the factor

form. So min-weight is a way of capturing the

fact that if you have two variable each of which has 100 value you might prefer

to never the less take a factor that has five variables each of which has only two

values. so these just look at the size of the

factors formed and they ignore the fact that some of these factors might be

inevitable. So a different strategy is to look at how

much extra work is caused by the elimination ordering.

So min-fill is a strategy that says, if I eliminate this node, how many nodes that

weren't friends before become friends due to this elimination step?

So here basically, I'm counting added complexity above and beyond the

complexity that I would have had from the edges that I previously generated.

And finally, again you have a weighted version of the fill heuristic that takes

into account not just the number of edges but also how heavy they are.

That is how big are the what's the car, what's the domain size of the variables

that they connect. And it turns out that min-fill which is

actually quite often used in practices a pretty good heuristic Now once important

way of understanding the issues of finding elimination orderings is by is by

looking at the following result. And the result tells us that the induced

graph that is produced by Variable Elimination, regardless of the

elimination ordering, is triangulated. So what does triangulated mean?

Triangulated means that you cannot have a loop of size more than three that doesn't

have a bridge an edge that goes in one direction or another.

So that there is a way of crossing across that loop without going all the way

around. Let's convince ourselves that this is

true. So here's here's a simple proof.

So once again assume, assume, so consider a set of variables that, and assume by

contradiction that there is a loop like this.

So for example assume that we have a loop,

outsize greater than three. And again, one of these variables has to

be the first to be eliminated. So assume that C is first to be

eliminated. Well once we eliminate C we end up

introducing an edge between B and D and so there has to be and edge between those

two neighbors of C. Because when C is eliminated the CD edge

must exist the CB edge must exist because as we observe the before you don't add

any neighbors to C once you eliminate it. And so at that point a fill, a fill edge

must be added. Though, it turns out that one way to find

an elimination ordering, an alternative to the heuristic that I mentioned

earlier, is to find a low-width triangulation of the original graph. And

there's been a lot of work in the graph theory literature on triangulating graphs

which we're not going to talk about, but it turns out that you could use all that

literature for finding good low-width triangulations and then use that for

constructing an elimination order. So now, let's convince ourselves that

this can make a big difference. So let's consider a practical application

and this is an application to Robot Localization & Mapping.

So what we're going to see here is a robot moving around and at each point in

time, it sees several landmarks, and it knows which landmarks it sees,

it can recognize them. And what it senses at each point is some

kind of approximate distance to the closest landmark.

So lets, write that down as a graphical model.

The graphical model looks something like this.

We have, at each point in time, a robot pose.

Which is the robot pose at time zero one up to the end of the trajectory T. And we

have a set of landmarks, whose positions are fixed the landmarks

don't move. So notice that these are not random

variables that change over time. They're just there, and they're, and

they're constant. And what you see here in gray are the

observations, so this, for example, assumes that at Time one,

Robot saw landmark one. And so what we have here is the

observation the observe distance is the function of the robot pose,

and the landmark position. And here at time two, we have that the

robot saw landmark two so we have this dependency over here.

And it lent at time three the robot also saw landmark two so you have two

observations for the same landmark. In here, I didn't draw the fact that you

could actually see the same landmark so you can see more than one landmark at any

at a given point in time but that also is easy to add in,