We're going to use a very simple diagram to demonstrate and to present the model.

Again we are on the search for a model for a signal X that has N dimensions.

What is N? For example if we consider 8x8 blocks.

As in jpeg we have that N is equal to 64 why 64?

We take an 8x8 block or image patch and we take one row after another.

We concatenate them and we get N = 64. This is the signal we are looking to

model. Now.

The next step and actually one of the main components of Sparse Modelling is

the Dictionary. The Dictionary is a matrix which is N by

K. N is the dimension of the signal we are

looking to model. K is the size of the dictionary.

We have K columns. Every column is called an atom.

It's kind of an image, kind of a patch. Because every column is 64-dimensional.

And we have K such columns. Very often, k.

Is larger than n. That's not necessary, but happens very

often. For example, when we are going to do

image renouncing as we are using as an example.

If case larger than n, then we say that the dictionary is over complete.

Now, if K is equal to N, then the dictionary's complete.

For example, Fourier or discrete cosine transform are complete dictionaries.

If K is less than N, we say that a dictionary is under-complete.

Now we have the dictionary. The second element of sparse modeling is.

This vector alpha. And what we have is a vector, alpha, that

has to have dimensions, K, and we multiply the matrix, D, the dictionary,

by the vector, alpha, and we produce the signal that we're trying to model.

What's particular of this vector alpha, is the number of non zero entries of the

vector is very small. We represent those by this red dot.

We have, at most, L non zero entries. And when we multiply this matrix, the

Dictionary D by alpha. What we are doing is having a linear

combination of the atoms. Corresponding to the non-zero entries.

So, we are combining, this is the second one.

It means it's picking the second atom. This is picking its corresponding atom.

And this is picking its corresponding atom.

And we basically have a linear combination according to the coefficients

here of those L atoms. And because these are only a few non-zero

coefficients, this vector is very sparse. And from that, the concept of sparse

modeling. So what we have is basically.

A signal. Represented as a product of the

dictionary, and a vector alpha. With only a few non zero entries.

A very sparse vector. Now, this model is, first of all, very,

very simple. Once we have the dictionary, d.

And alpha will obtain the signal just by a simple product.

This dictionary, we're going to discuss a lot about it in this video, in the future

videos. But for now, it's a fixed dictionary of

dimensions N by K, as we have seen in the previous slide.

Now, the model is extremely rich. Let's just do some numbers.

If we have a dictionary of size K, meaning K atoms, and we pick L.

We have L choose out of K possibilities. So if L is three we can pick any three

out of K. Once we pick them, each one of those

selections define a subspace, a very low dimensional subspace.

Once we pick, we do that selection, the signal, it's just a linear combination of

the corresponding atoms and that's a linear subspace.

But we have many of them. We have L out of K, a lot of them.

So the model is extremely, extremely rich and, hopefully, that will help us to

represent. Basically any signal of interest.

So, the model is simple, the model is very rich.

And also, the model is actually familiar to us.

This is not the first time that we see such a model of taking basically a signal

and representing it as a linear combination of a dictionary with only a

few atoms of that dictionary. Let's think for a minute, or for a few

seconds, when have we seen already this model?

We. Remember what we have learned in the last

few weeks. We have seen this model.

In JPEG. When, or how, in JPEG D.

Is basically the cosine basis. In JPEG we took and eight*8 block.

We did a transform into the cosine domain.

So d is the cosine basis. And alpha is basically the coefficient of

the cosine transform. And we saw that, for example, when we

quantize. A lot of the, these great cosine

transform coefficients became zero. So basically alpha was 64-dimensional.

But a lot of it was zero. So it was a very, very sparse vector.

And it achieved an extremely good representation of the eight by eight

patch. With just a few provisions, so JPEG is a

great example of the power of sparsity. Let me just give you another extreme

example of how we can represent signals in a very sparse fashion.

Consider that D includes all the signals of interest.

So if you're talking about images, this has huge matrix, K basically goes over

all the possible images. If that happens, then every time you want

to represent a signal x, an image. You only go and pick the corresponding

atom. The corresponding column in d.

And that means that our sparsity is one. So this extreme example is only to

illustrate that we can do sparse representations of signals.

Of course, it's not very practical; because imagine that you cannot put all

the patches or all the images in one dictionary.

That would be huge, and it won't be very useful.

The JPEG is, as we say, probably the most successful image processing algorithm,

and it's based on Sparse Representations. So, that's what we want.

We want to Sparse Representation of sigmas.

And the basic idea once again is that the signal is a linear combination of a few

atoms because alpha is sparse. How do we measure Sparsity, is the next

question? The way we are going to measure Sparsity

is with. This, which is called the LP norm for a

given P. Remember, alpha is nothing else than a K

dimensional vector. And what we are doing here with this formula is that we take

every entry of alpha, and we take the absolute value, and we elevate to the P

power, and we add all of them. And that's the LP norm of the vector.

Let's illustrate this for different values of P.

If p is equal to two, we don't really get what we want, because what we want is to

penalize. With an equal amount, every time one of

the entries of alpha is non-zero. And when it's zero, don't penalize at

all. So, that happens.

Zero, no penalization. But then the penalization increases

quadratically with actually demanding to lose the entry.

We don't want that. We want kind of an equal penalization.

As long as it's non-zero, it gets penalized.

What happen when P is equal to one. At least the penalization now is

proportional. Directly proportional and linearly

proportional to the magnitude of the entry, is not quadratic, that's better,

but doesn't look like it is good. We are going to see, later on that this

is actually a very good penalty function. This L1.

L for P = 1 It doesn't look like. But we are going to see that, under

certain conditions, this is exactly what we are looking for.

But before we get into that, let's keep looking for the ideal case.

If now, P becomes less than one, then we start to have what we really want.

A penalty of zero if the coefficient is zero.

And then the penalty starts to flatten out, meaning.

As long as the coefficient is non-zero, we get the same penalty.

If we further decrease p less than one, I'm getting close to zero, we get exactly

what we want. Zero penalty and equal penalty for

everybody that is non-zero. So the way we are going to measure

sparsity is with this function. For P equal zero.

And that's often written. As in this fashion with a zero here, and

a zero there. This is what's called a L0 pseudonorm.

It's not really a norm, and that's why it's called the pseudonorm, although very

often, it's called also the norm, with the understanding that it's not exactly a

norm, and this is how we are going to measure sparcity.

We are going to count the number of non-zero elements in the vector alpha.

[SOUND] So this is what we basically have here.

Signal represented as a linear combination of atoms from this, that

we're not allowed to pick more than l of those atoms.

So back to our problem of image denoising.

The Maximum-A-posteriori estimation of image denoising.

What we have, is that we want to represent our signal.

Very closely, this is the measurement and this is what we want to be.

We want to have a close representation, meaning very close.

X have to be as close as possible to Y as long as it belonged to the right model

and then we replace X by the alpha. Because we know that now we're going to

be representing the signal as. The dictionary multiplied by alpha that

not every alpha we basically must have the no more than L entries in alpha are

non zero. To give a.

Illustrative representation, and of course we get back.

The signal by the alpha, where alpha is the optimal.

So, in a, in a very simple pictorial representation, what we have is D alpha,

this is D, has dimensions N by K. This is alpha, that has dimensions K.

We'll subtract Y. That's aware approximation error.

But we are not allowed to pick any alpha. We're only allowed to pick alpha that has

at most L non-zeros, which means that we are going to be approximating Y by, at

most. L columns, the linear combination of at

most L columns of D. And from now on, the representation is

given by alpha. Instead of X we use alpha and we know if

we want to recover the signal we basically multiply by D.

Now what we have obtained is a very low dimensional representation of.

Our signal, and a low dimensional approximation of y.

And the idea is that, if we reduce the dimension of y.

Basically, we are getting rid of the noise.

Because we are not allowing to use all of the items.

We're only allowing to use very few, projecting the signal into a low

dimension. And therefore, reducing the noise.

It's the same as we saw with the points. We have points.

On the plane. That if we say that we are only allowed

to represent all these points with a straight line.

With feet of straight line and we're basically projecting the points onto the

straight line and by that we are denoising the points because we go into a

much lower dimension. In this case we just go from two to one,

but that illustrates the concept that we're going to a lower dimension of space

and by that we get rid of the noise. In order to represent the noise we will

need many additional atoms and that's why we are restricting alpha to be sparse.

Now. Are we done?

We are only at the very beginning of this week.

So for sure we are not done. Let me illustrate a few of the problems

that we need to still solve. Now.

We were talking about this, minimize the error under the constraint that you are

not allowed to pick more than L non zero entries in alpha.

Is this the problem we need to solve? Maybe we can solve it in this way.

Let us constrain here the error. You don't want more than the given error

and find the sparse's possible vector that achieves this error.

Or we can actually have a combination of the sparsity and error and try to

optimize for alpha such that it minimizes the sum of the two terms with a

coefficient lambda. So, we have three possibilities here.

Sometimes the three are equivalent. Sometimes they are not.

Which one shall we be using? Another problem.

In this case, a theoretical problem. Will we always find an alpha?

So, I give you a y. I give you a dictionary.

Will I be able to find, for example, an alpha that is sparse enough, and

represents this signal up to certain error.

Is that possible? And if I find it, will that be unique?

Will there be only one alpha? Will any alpha have the same support, the

same non-zero entries? When is that possible?

And of course what about a dictionary? Which dictionary should I use in order to

get alpha as sparse as possible? This is going to be something that we are

going to discuss quite a lot during this week.

And these are some of the questions these very nice sparse modelling and sparse

representations open. So what have we done?

We basically started with image denoising as an example and we say, we need to

model the signal, we need to understand what do we mean by image denoising, what

type of clean signal are we looking for. And that basically opened a question to

say, we need to define models. When we say we need to define models, we

basically. Describe a model, which is the sparse

model. The signal is a linear combination of a

few elements. A few atoms of a given dictionary.

Once we define that, are we done? Of course no.

We have all these questions that we just mentioned, theoretical, computational,

how do I compute that alpha and, of course, how do I compute a dictionary

that is appropriate for the type of signals that we are looking for?

So these are some of the questions that we are going to address during the next

few videos. And I'm looking forward to that.

This is, as I say, a very, very exciting topic.

And very, very active topic in image processing right now.

Thank you very much. Looking forward to seeing you in the next

video.