Now this to is done using the m because the breakdown within a phone into these

little constituent pieces is not labeled. And so it's still requires that we d 'em

to address the issue of latent variables, but at least it's now self contained,

because you're only training a model for a single phone.

like puh. With that model trained, we can now one

can now take entire words, and use the model, initialized with the phone with

the model trained on individual phones, to now train the higher level model.

And one still retrains the phone HMM parameters.

In the context of this larger training regime where one trains on entire words,

but the fact that one seeded the model with this much more correct

initialization on the parameters allows the segmentation in the E step to be

performed moderately correctly and give rise to a much better local optimum in

the speech recognition problem. A yet different application is that of 3D

robot mapping. So, this is an application that's due to

thrun at all, and . Here the input to the, to the problem is

a sequence of point clouds obtained by a laser range finder that were collected by

a moving robot and the target output is a 3D map of the environment as a set of

plains and you will see the difference as to why we want the plainer map when we

look at the demo. Here the parameters of the model are the

location and the angles of walls or the plains in the environment, so we have no

idea of priori where there walls and how there situated.

The latent variables are as an association.

Variables, which assign points to walls. And so the EML Rhythm effectively in the

E step figures out which points go with which wall, and in the M step, it figures

out how to move the wall around to better fit the points that were assigned to it.

So what we see here is the raw data collected by the robot.

The red box is the robot moving around in its environment.

The red beams that emanate from the robot are the directions that the laser took in

order to collect the point cloud and what we see here is the point cloud that was

collected by the robot as a Traversty environment.

And even just looking at this image we can already see that there is a lot of

noise in the laser range data, and that and that is going to give rise as we can

see to a very noisy map of the environment.

If we now look at the marvel constructed by Yen.

So now we are going to see the planer map constructed by the robot.

And this is done on the fly actually as the robot is moving.

We can see that. walls are constructed when there is

enough data to support them and, that's when enough points are assigned to a wall

then, that wall gets constructed and it's pose in the environment gets determined

by the EN algorithm. And, so we can see that everybody's

construct a much more plausible and realistic map of the environment than

just looking at the raw point cloud data. .

A, different application is also in the context of 3D laser range scans.

And, you know, we pick those, not because these are in the most common applications

of the M. But because they give rise to some pretty

cool movies. So, here is, the problem of getting, in

this case 3D, brain scans of a person in different

poses. And the goal is to see whether by

collection a bunch of these poses one can reconstruct a 3D skeleton of.

Person. So from front and front back.

So, in this particular case. the first problem is actually to

correspond points in the difference cans to each other and in this case that was

done by a different algorithm that I'm not going to talk about now although it

also used graphical models in fact it used a belief propigational algorithm.

But now lets talk about the clustering problem that is the problem of assigning

points in each of those scans to body parts.

So here we have the notion of a cluster which in this case corresponds to a part

and. Each part.

in a given, in a given, scan of the person, has a transformation, so that's

why we have a plate around, each of the parts, because we have an, extenty, fo,

because, for each of those parts there is multiple estantions and then multiple,

scans that we have the same person in different poses.

Now a landmark is a particular point on that

On that person, in that, in that part, and if we knew the part, that is, if we

had the part assignment, which is a latent, unobserved variable, then we

could predict how the point that we see. On, this scan, would translate to the

point on this scan. Because that would be effectively a,

deterministic or close to deterministic function of the unobserved,

transformation of the part. That is the fact that the arms, they

moved from this pose over here to this raised arm pose over here.

So given the transformation and given that we know which part.

The point or landmark is assigned to. We can predict how the point transformed

from its original position to its new position.

So this is a way of clustering points into parts and one can run 'em on that

and if one does that effectively one gets pretty much garbage, because it turns out

that there is enough muscle deformation to make the actual positions here fairly

noisy and it's very difficult to get correct part assignments from that, but

if we now add an additional component into this model.

Specifically continuity of space, we can now consider say two points that are

adjacent to each other and impose the soft constraint, not the hard constraint

but a soft constraint that a part assignment of two points that are

adjacent should be softly the same, doesn't have to be exactly the same

because otherwise of course everything would be assigned to the same part, but

it's a soft constraint which is actually an MRF constraint.

In fact it's even an MRF constraint, which is associate or regular.

As we discussed before. And so these this model now that actually

does the clustering not of each point in isolation but rather assigns points

jointly to parts taking into consideration the geometry of the

person's body is a considerably better gives rise to considerably better

outputs. And what we see here is the algorithm in

action. And we can see that the algorithm

converges very nicely to a partition. that points into different parts and from

this one can easily reconstruct the skeleton.

A final application that uses 'em in a different model is one that was used in

this helicopter demo alignment. Here the input and this is work that was

done at Stanford by And the rings group.

And, here the input to the algorithm is basically, different trajectories of the

same aerobatic maneuver flown by different pilots.

So they were all trying to do the same thing, but each person has their own

idiosyncratic way of doing this. And so the, the exact sequence wasn't, as

we'll see, exactly the same. The goal of this was to produce an output

which aligns the trajectories to each other and at the same time learns a

problemistic model of what the target or template trajectory ought to have been.