To apply these identities,

I first rewrite the equation that we desire

to minimize from the previous slide as the first line of

the equation that you see here and we take

advantage of the fact that the derivative is a linear operator.

The expectation is also a linear operation which means that the expectation of

a sum of things is equal to the summation of the expected value of the same things.

I can move a summation into and out of an expectation,

I can move a linear operation into and out of

an expectation and we do that quite frequently.

So, here because the derivative operation is a linear operation I

can say that the derivative of of

an expectation is equal to the expectation of a derivative.

I can move the derivative inside.

So, when we do that,

we need to compute the derivative of x times x,

x transpose x with respect to x hat.

Notice that must be zero because x transpose x is not a function of x hat.

So, that first term goes away.

Then, the second term,

it has a two times x transpose times x hat with respect to

x hat and that's the first identity on the slide.

So, the Y transpose is equal to two x transpose in this case.

So, we compute that result.

Then, the final thing we need to find the derivative for is x hat,

there should be a transpose there,

times x hat and that's the third identity on the top if we let the A matrix be identity.

So, the answer to that derivative is two times identity times x hat.

So, bottom line, we use this dictionary of three identities and

we apply it to this differentiation problem and we come up with an expression,

which is the first expression on the second line.

Then, we recognize that the expectation of this summation is equal to

the summation of expectations because of linearity and furthermore,

the expected value of a constant is itself so I write this as,

well, I've got minus two times negative x hat inside of the expectation given Y.

So, that's just going to be positive two times x hat.

Then, I also have the expected value of x given

Y multiplied by negative two and I can't simplify that yet.

So, the second part of the second line is a simplification of the first part,

recognizing that expectation is linear.

Then, I set that equal to zero and I

move the two components over two different sides of the equation.

I write then, that the optimal state estimate of my system right now,

x hat k plus is equal to the expected value of x hat k given

all measurements up to and including time k. So,

that's some pretty interesting.

We now have this equation for the optimal state estimate and it might seem reasonable.

The best estimate I can give you is what I expect it to be,

given all the measurements that I've made right- until right now.

The problem is I can't implement it.

I can write it mathematically, I have,

it's very simple looking but I don't know how to compute

this expectation inside of a programming language.

So, you might consider why I can go back and I can look

at the definition of an expectation using integrals.

Defining the conditional expectation by trying to

approximate integrals and that's essentially what particle filters is trying to do,

not exactly but that's basically the idea of a particle filter.

So, it involves every time step

approximating integrals and that's very, very computationally intensive.

So, we're going to look for a different way of doing it.

We desire to come up with a set of steps that can be implemented in

computer code that somehow implement or compute this conditional expectation.

When we are successful in doing so,

we will have derived the Gaussian Sequential Probabilistic Inference Solution.