0:00

As we said, there's many different kinds of queries that one can use a graphical

model to answer. Conditional probability queries are very

commonly used, but another commonly used type of inference is called MAP

Inference. So, what is MAP inference?

MAP stands as a shorthand for what's, for Maximum a Posteriori, and it's defined as

follows. Here, we have a set of evidence of

observations, e over some variables to E, and we have a query Y.

And it turns out that for computational reasons, that we're not going to discuss

for the moment it's important that the set of Y's is everything, all of the

variables other than E. [SOUND] So now, our task is to compute

what's called the MAP assignment, which is the Y, the silent little y, to the

variables y that maximizes the conditional probability of the variables

y given the evidence. Now so for those of you who haven't seen

the notation Argo Max. Argo max is the y that provides the

maximal value to this expression, over here.

Now, note that in some cases, this maximizing value might not be unique.

That is there might be several different assignments say, A1, Y1, and Y2 that give

the exact same probability. And so, the MAP is not necessarily a

unique assignment. This has many applications, some of which

we've discussed before. So, in the context of message decoding,

for example, where we have a set of noisy bits that are passed over the channel of

a noisy communication channel, what we often want to get is the most

likely message that was transmitted. That is, an assignment to the transmitted

bits that is the most likely given or evidence.

In the context of image segmentation, we would like to take the pixels and figure

out the most likely assignment of pixels to se, to semantic categories.

So both of these can be viewed as MAP assignment problems.

And an important thing to understand about MAP is that it really is a

different problem than conditional probability queries.

So to understand that, let's look at the following very simple

example over just two random variables. So here, we have a Bayesian network over

the variables A and B. And if you multiply the to CPDs, it's you

get the joint distribution shown over here.

And it's not and it's fairly immediate to see that the MAP assignment is this one,

because it has the highest probability. Can we get at the map assignment by

looking separately at the variable A and at the variable B?

So if we look at that, we see that if we look separately at the variable A, we

can, the most likely assignment to the variable A is A1.

As opposed to in the MAP assignment, the variable A took the value, A0.

And so, one can't look separately at the marginal over A and over B, and use that

to infer the MAP assignment. And the reason is that we're looking for

a single assignment over all of the variables that together has has the

highest probability. Unfortunately, just like the conditional

probability inference however, this problem too is empty heart.

So, again let's formalize what exact problem is empty heart in this context.

And it and so here is, again an, one example of an empty heart problem in the

context of MAP which is just define the joint assignment with the highest

probability. It's not the only NP-hard problem.

Here is another one. Figuring out what, for a given

probabilistic graphical model and some threshold little p,

whether there exists an assignment, little x, whose probability is greater

than p. That problem too NP-hard.

So should we give up. Well, just like in the context of

conditional probability queries, the answer is no. And there is algorithms

that can solve this problem very efficiently. In fact, in a even broader

set of problems then, for conditional probability queries.

So, let's again look deeper into this problem and understand the foundations of

what might make it tractable. So, going back to our example of a Bayesian

network, once again we're going to view CPDs as

factors. So here a P of C again translates into a

factor over C just like before. And whereas, in the case of conditional

probability queries, we want it to sum out some of the variables marginalize

them. Now, we're going to what to find the arg

max which is the assignment of these variables which maximizes the product.

And so, here we have a max of a product, which is why this is often called a max

product problem. Let's break down the max product problem.

So, imagine that we, in more general have, in the more general case have

probability over Y given E equals little e and re, let's remind ourselves that Y

is is the set of all variables other than the ones in E.

And so, by the definition of conditional probability, we have this ratio whose

where we're trying to find the maximal Y that maximizes this ratio.

And notice that the denominator is constant relative to Y.

[SOUND] Sorry yes, with respect to Y. Which means that for the purpose of

finding the maximum Y, we don't really care about the denominator only the

numerator which is the probability of Y, Ee = e, the unnormalized a normalized

measure. The prob, this numerator in the general

case is a product of factors normalized by the partition function, where the

factors here are the reduced factors, reduced relative to the evidence.

7:44

We also have message-passing algorithms, which again are a direct analog to

algorithms for some product. And they give rise to a class of

algorithms called max-product belief propagation.

However, because the MAP problem is intrinsically an optimization problem,

one can also call in techniques that are specific to optimization.

And the class of such methods include methods that are based on integer

programming, which is a general class of optimization

over discreet spaces. It turns out that this class of methods

that build on integer programming techniques, is one of the most popular

methods in the last few years and has given rise to, a whole new range of math

algorithms that are considerably better than many of the previous algorithms

developed after that point, especially for the approximate case.

It also turns out that for certain types of networks that some of which we'll

discuss, there are specific algorithms that are very efficient for that

particular class of of graphical model. And one of those, perhaps the most

commonly used if not the only one, is methods based on class of algorithms

called graph cuts and. And,

and finally, because it's an optimization problem, one can also use standard search

techniques over communatorial search spaces,

and they're problems for which this is also a very useful and successful

solution. So, to summarize the MAP problem, aims to

find a single coherent assignment of highest probability.

And, that means that it is not the same as maximizing individual marginal

probabilities as we saw in the example. one can reformulate this problem as one

of finding the max over a factor product. And, this is a communitorial optimization

problem which admits a whole range of different solutions on which some are

exact and others approximate.