So here, you see actually, in this case just to make our illustration easy, we

use n = 4. So, these are, each one of the bases,

if I go back to the formulas and basically, I put u 0, = 0, I get a cosine

which is a function of only x and y. The formulas were functions of u, v, x

and y, but now I fix u and v,

I make both of them 0, I get a formula of x and y.

I get actually a 4 x 4 image which would be this.

If I basically fix v = 1 and u = 0, I get this and so on.

So basically this r and s are nothing else than what's called basis functions.

And what we're trying to do is we're trying to decompose our small n x n image

into the linear combination of this basis functions.

So if your image is flat, it's going to have your T(u, v) is going to only be

known 0 for this. If your image has a lot of variations,

you're going to have coefficients also for T(3, 3). So each one of the T(u, V)

tells you how much of this type of component you have.

Okay? So basically you have represented your n

x n image into a linear combination of this type of images.

The important part is that these images are constant.

Again, in the Karhunen-Loeve, they won't be, and that's very difficult to have a

very efficient how our implementation to be able to do that in real time.

Here, this basis is fixed. You give me an image, I'm going to

represent it as a linear combination of this basis.

And T(u, v) tells me how much of each one of these I have in my image, in my

timing, n x n image. So, that's a transform.

Basically, nothing else than going into coefficient that tell us how much that

each one of these components. So I decompose my image into these

components which are kind of frequency components.

You see that there is more variation as I increase v and as I increase u in the

vertical or horizontal direction. There's one thing I haven't told you.

I say, okay, I cant do Karhunen-Loeve transform, I cannot compute bases for a

Karhunen-Loeve because its too expensive, then I go and do a discrete cosine

transform. Why a discrete cosine transform? Why not

a Fourier transform or any other transform? How the mark transform? There

are many transforms out there. So there is a couple of reasons why we

use the DCT. One of them is that, although I want to

do a Karhunen-Loeve transform, I need to do a DCT.

It turns out that the DCT is for particular cases, actually exactly equal

to the Karhunen-Loeve transform. Those particular cases are when the

images what's called Markovian. So a pixel depends basically on the pixel

next to it in a very particular form, and again, the pixel next to the pixel

next to it. So those are images that are called

Markovian. So if you assume something about your

image, you can show that the Karhunen-Loeve transform ends up being

the DCT. So it, at the end, it is a constant basis

like we see here. That's one reason and to basically decide

to use the DCT. The other reason that we use the DCT is

because of the following. Remember that I'm working from the whole

image and basically having n x n blocks. Okay?

Someone may wonder why not to use for example Fourier transform?

If we go back into the theory of Fourier transform, when you do a discrete Fourier

transform, you're making an underlying assumption of periodicity.

So you are basically assuming that your image repeats itself as we can see here.

So this comes back here and this comes back here.

Okay? So there is a repetition of your image.

That's the underlying mathematical assumption to be able to formally do a

Fourier transform of this 8 x 8 block. And here, I'm drawing just one line of

the 8 x 8 block, but the same assumption basically goes in the other direction and

then in the two-dimensional. Now this this basically means that when

your block ends, you're expecting the pixel in the next block to basically be

identical to the pixel that it was here. Okay? So you expect this block to

basically repeats here and repeats here. That's your underlying assumption.

It's not very probable that the pixel that was here is the same pixel value

that was here, not even close in an 8 x 8 block maybe.

On the other hand, DCT assumes a different type of periodicity.

Basically DCT assumes that at the boundary, you have mirror symmetry.

So basically, your image basically repeats,

but it kind of false. So, the assumption says, okay, I'm

assuming that the pixel here is similar to the pixel here right next to it.

Note that the pixel is identical to the one that comes 8 pixels later, but just

the pixel right next to it and that's a much more reasonable assumption.

So, those are two of the main reasons why we use a DCT.

One is because for certain types of images the DCT is the Karhunen-Loeve.

The other is because the periodicity, the type of the periodicity assumption is

much more applicable to images where we started by this n x n or 8 x 8 blocks.

So now, we have the DCT. We started with an image.

We divided n x n blocks. We do a transform into a discrete cosine

transform, where every coefficient tells us how much of this basis are we using.

I keep talking about 8 x 8. How we came to 8 x 8? Actually there was

a lot of studies before jpeg was defined to get 8 x 8,

but let me just illustrate what happens here.

This is an image that basically, we took the whole image.

In this case, we don't divide. This is already a block of that image,

Lina, that we talked before. We did a discrete cosine transform of

this entire block and we took 25% of the coefficient.

So we took basically only 1/4 of the coefficients, and we made everybody else

0, and then we did the inverse, and that's the result that we got.

We have introduced error and remember, basically, the DCT is, what it's going to

try to do is going to try to bring us into a domain that introduce in error,

basically killing some of the coefficients making them 0 or making

errors on them, won't be as noticeable. But in this case, we did it very simple.

We just took 25% of the DCT coefficients. Which ones is going very clear soon?

But just imagine that we took the, the 1/4 of the basically largest

coefficients. Now, first we did it for the whole image.

Here, we actually took only 2 x 2 blocks. So we went here and we divided in 2 x 2

blocks and we need a DCT of every 2 x 2 and we took 25%..

So be is, basically ended with only one and we came back.

And you see this very, very annoying blocking artifact.

Then we did the same for 4 x 4 and we do the same for 8 x 8,

And we see that basically its gets better and better.

So, you might wonder, okay, 2 x 2 is too little,

but why to stop at 8 x 8? Why not to go to 16 x 16 and 32 x 32?

And, and you know, why not to just do the DCT for the entire image?

There's a couple of reasons for that. One is computational,

the, basically it's cheaper to do multiple small DCTs than one large DCT.

So if you have a, let's say a 16 x 16 image, I'd rather do four 8 x 8 DCTs than

doing just one 16 x 16 DCT. That's one of the reasons.

Another reason is this magic approximation of DCT to the

Karhunen-Loeve. I say that happens under certain

conditions, a Markovian condition.

If you're in a small region at 8 x 8 block,

that condition, that Markovian condition will probably apply.

But if you are on a very large image, the whole image won't have a Markovian

condition. So basically, in small regions you expect

that assumption to hold, but it doesn't hold in large regions and it's not hard

to see that. If you're just, let's say, looking at me

right now, you see that there's a lot of correlation here,, more or less the same

gray values. But if you actually start moving away and

you say, is there a correlation between the pixel value here and the pixel value

here that they are very far away? There's actually no correlation.

I can actually change my shirt and without prior information. So there is no

correlation when you get far way in your image, pictures that are far away,

basically they start being decorrelated. So you need to get to a compromise. On

one side you want very small blocks because you want to be computation very

efficient. on, on the other hand basically you don't

want too small blocks, because you get into, into this problem of not having

enough information in the block, enough correlation to be able to decorrelate

with the transform. And if you get to two large blocks, you

start paying computations, and you start paying that you violated the assumptions

for which the DCT is good. So 8 x 8 is a good that I say was, was,

was significantly studied and it got into a very, very good compromised and that's

why it used today for JPEG. So, to conclude this part of the DCT, I

just want to show you once again this image, Lina. And these are just two

different cases, but don't worry about that for the moment.

Here, basically we took 8 x 8 plots, we did a DCT, and we kept 12.5% of their

coefficients, and then we came back. Not really quantization, which is going

to be our next step, just a DCT and the inverse DCT.

And it looks pretty good than is, it just when we just look for the first time at

the image, we actually have achieved compression, because we only kept 12.5%

of the pixels and we got an image which looks almost identical to the original

one and that's because we did that in the DCT domain.

If we were to actually flow basically a lot of pixels here and keep only 12.5%,.

we will not be able to get this quality of the image.

This is just the error between the original and the reconstructed.

And as you see, it's relatively small error.

So this is pretty good compression by nothing else than doing a DCT,

basically removing a lot of coefficients and coming back.

So now, we have the front end and we have the back end.

What's missing is the middle. I'm in the DCT domain, can I achieve even

more compression by doing quantization in a smart fashion?

And that's going to be the subject of our next video clip.