Hello and welcome back. During this video I want to talk about

the implementation of sparse modeling, in other words how do we compute the alpha

that we have seen in the previous videos? During the video we are also going to

discuss some theoretical results, I'm just going to mention their existence and

why they are so important. Let's start with that.

Assume that we have just produced a signal X as this product, basically as a

sparse model. So assume that we know the dictionary D

and we randomly pick an alpha with a few non zero coefficients and we generate X.

And now our goal is to find back this alpha, so now we know the dictionary.

And, we know the signal and we want to compute it's sparse representation.

That's our goal right now. As we said later on we are going to

discuss about how to compute the dictionary itself, so.

As we have seen in the previous video, we want to compute the alpha that has the

smallest possible number of non-serial coefficients, that's also called a

support, and we assume that there is no noise right now.

We have just generated this signal, so it's kind of the ideal scenario.

So that's our goal right now. And there are a number of questions.

One of the questions is why would we get, with whatever algorithm that we produce

here, that we are going to explain in the next few slides.

Why are we expected to get exactly the same alpha that produce the sigma?

In other words, is that alpha unique? Can you think about a scenario where we

know that alpha non necessarily has to be unique?

And can you think of a scenario where we know that alpha will be unique?

Let's think for a second about that problem.

One example where alpha is not unique if for example several of the columns of D

are identical. So.

Let's assume that this was generated by one of those columns.

I can actually pick any other column, because they're identical, and I can

replace that alpha by, by the alpha corresponding to that column.

I assume that column one or two are identical.

I assume that, basically, I generated this signal with exactly this pattern.

So, I generated it with the second column.

But when I go to reconstruct, I can pick column one, you still have column two,

and therefore, I get that different alpha.

So that's an example where basically, the solution is not unique.

An example where the solution is unique is in DCT, Discrete Cosine Transform or

in the Fourier Transform. We know that actually the transform of a

Signalling Fourier or in Discrete Cosine Transform is unique.

So those are cases, now where was the difference?

In one of the scenarios, D was basically had the repetitions.

In the other scenario we know that Fourier and the cosine every column is

independent of the rest, they are both orthogonal to the rest.

So it looks like the structure of the dictionary D has a lot to do with the

uniqueness of alpha. Here comes the second question.

Will I get, because in the previous case at least the support, the number of

non-zero proficients, meaning the size of the support, was the same.

Could I get basically maybe even a more sparse representation than the one that

produce it? And, once again it's not hard to think

about of an example of that. Assume that basically you generated a

signal with all these three, and this is X.

Now assume that by chance that X was part of the dictionary D.

Then when you're going to try to recover, your sparsity goes back to 1.

So you went from three to 1. What's, what's the problem there once

again? One of the columns is basically a

combination of the others, so if you produce a signal that has such

combination. And then you might be able to get it back

without actually less coefficients. So these two problems, actually the

questions depend a lot on a number of things.

The dictionary, meaning it's size k, and also its general structure.

And the basic idea, as we have seen from this example, is that we want the

dictionary D to have atoms, columns that are as anchor related as possible to

guarantee that basically we have unique solutions.

And that basically we get back the sparsest signal that actually produced

the signal. So depending on the sparsity level.

Basically, depending on the number of non-zero coefficients.

And on some conditions on the dictionary. There is beautiful theoretical research

out there that guarantee uniqueness. And that guarantee that we get the,

basically, smallest possible support when we do the type of reconstruction

algorithms that I'm going to show next. So it's important for us to know the

existence of this very important theoretical resides in the background

that support the type of techniques that we are developing and that support the

use and the exploitation of sparse modelling.

Our goal now, assuming the theoretical results, is actually to solve sparse

modeling. So we want to find.

Now, we are giving a signal, y. Remem-, now is not the ideal case

anymore. X is going to be represented by this.

So this is our x. We want to find their representation of

x. Given the dictionary d such that we get

the smallest possible super meaning the smallest possible non zero coefficients

and we don't get too far away from the observation.

And that's our goal right now. To solve this problem.

This is actually what's called a commutatorial problem, and it's MP.

It's actually, you cannot solve this problem as I just posed it in basically

in realistic time. Let's just explain why.

We're going to basically do the following, which is the only way to solve

exactly this problem. We start with L equal one, so we look for

a signal that has sparcity one. That's what we are going to do first.

So, we generate all the possible supports of that signal, because L is equal one,

basically the support means only the first one is active, or only the second

one is active, or only the third one is active, and so on.

Now, how many possibilities we have? We have already seen that.

We have l chosen out of k. In the case of l equal one we have

basically k possibilities. When we are going to iterate this,

because then we are going to do k equal two, we have of the order of k square

possibilities. And as l increases, we're going to have

more and more possibilities. But for now, we basically have all the

possible supports. Now with.

Each one of those possible supports will basically go and solve the fitting

problem. So we say okay, what happen if I'm only

going to use the first atom? All what's left is to pick the scale.

How much of that first atom will I use? And then I solve this problem.

This is a least squares problem. It's a very simple quadratic problem.

Many ways to solve it, but the basic idea is that we are going to project the

signal into the atom that we have just selected.

And we see basically we try for each one of those.

And we actually seen what's the error. If the error is less than what we want

then we're done. We basically peak the alpha that produced

that small error. Now if there is not then we increase the

support. So, we have just tried, basically, with L

equal one. Now let's try with L equal two.

So we pick all the possible supports that have two none two active, two non-zero

coefficients. So we say okay,

how can I approximate the signal Y with the first and second atom together, or

with the first and third atom together, but with the seventh and the twenty fifth

atom together. So we pick every possible tool.

Again there are two chosen out of K which is in the order of K^2 and we try all of

them we solve for each one of them this problem again just a least squares

problem or a projection on to the sub-space generated by the selected atoms

we try again we keep going and we keep going so, thus we solve this.

Basically absolutely no problem. What's the problem with this technique.

The problem with this technique is the following.

Assume that K1000. = 1000, that's a standard size.

It's a good size as I say it's about the size that we're going to be using when we

talk about imagery noising. And assume that I tell you they have to

be ten atoms. So basically it's ten chosen out of 1000.

That's about of the order, starts being the order of possibilities starts, starts

being about the order of 1000 to the tenth something like that.

So, it's a very, very large number. Assume that each one of the computations

that we have here takes about one nano second.

So, you try all possible supports of size ten.

You solve this problem in one nano second.

These are all reasonable numbers. What you get is that you're going to need

a long, long time to solve this problem. Look at this number.

It's a huge number. That's how much you're going to need to

solve the problem if I already gave you that L is equal ten.

Imagine what would happen if you have to run over all possible L's.

It's impossible. That's why the problem is very hard.

It's impossible to solve as it is. So, are we done with, wick, with this

week? We just presented a great model that we

cannot solve for it. No of course we are not done.

Now we are going to have to find a way to solve this.

How do we do that? There is actually a couple of fundamental

ways to solve this problem. Again, a good fitting, with as sparse as

possible representation. There are again two ways to do that, one

way is what's called relaxation methods. Can we change these.

For something that is solvable. Something that we can actually do in our

computer. Meaning, can we relax our condition in

such a way that it's almost the same, but we can solve it.

The other way is to do what are called greedy algorithms, and that means, okay,

forget about finding all of them at the same time.

Give me the most important item, then the second, then the third, and so on.

So let us basically briefly present both of these techniques.

Relaxation methods are also some times called basis pursuit methods.This is our

problem, this is what we want to solve, relaxation methods will change this.

Look what I'm doing, I'm replacing the zero norm, or pseudo-norm by the one

norm. I'm now, no, not just counting the number

of knowns zero efficients, I'm basically counting with their magnitude.