with probability 0.25, I'm going to have length of 01,

a length of, of two. That's my, my code, 01.

With probability 0.47, I'm going to have length of 1 plus 0.47 times one.

With probability 0.25, so for one quarter of the image, I'm going to have length

three. And with very small probability, I'm also

going to have length three. And this is equal to 1 60,

1.81. Okay.

So, instead eight bits per pixel, I now have 1.81 on average on my whole image.

I basically, I have save lots of pixels. Okay, I have gone from eight to basically

1.81. And that's a, that's a compression of

over four times, and that's what we have done.

Basically, what we did is we have exploited the fact that some of the pixel

values appear more often than others, and then I have to give shorter codes.

I can give shorter codes. So, how we do that, we do that using

something which is called Huffman coding. And I'm going to explain next how to do

it. So, lets just look once again in this

particular example, the best way to explain Huffman coding is just to use an

example, and I'm going to that next. So, let's assume that we have five

different symbols, six different symbols, sorry.

And I order them, the first thing they have to do is order them with, according

to the probability. The highest probability first and the

lowest probability at the end. And if some probabilities are equal, it

doesn't matter which one we put first. So, for example, this symbol a2 might be

the pixel value 100. The symbol a6 might be the pixel value

77, and so on. So, I'm going to add,

basically I'm going to list the prob, the symbols here and the probabilities here.

And from the bottom, I start to add. I take the two lowest.

And basically, I add their probabilities. So, I go from 0.04 plus 0.06 and that's

probability one, 0.1. And then, I do the list again.

So, none of these has changed, the only one that changed is the last one.

See what's happening here. What's happening here is with probability

0.4, I can get a2, with probability 0.3, I can get a6, with probability 0.1, I can

get a1, with probability 0.1, I can get symbol

a4, and with probability 0.1, I can get either a3 or a5.

I am going to have to do something to distinguish which one is it, is it a3 or

a5? And that's going to come in the next slide how we distinguish them.

Now, we do it, we do this again.

We take the two lowest and then we add them.

And now, because it became 0.2, remember, we have to reorder all the time.

So, let me just describe that again. With 0.4, we can get a2.

With 0.3, we can get a6. Now, with 0.2, we can get either one of

these two, which is a4, a3 or a5.

I don't know which one but I know with 0.2 probability is one of them.

And then, we have 0.1 again. We do that once more, once more,

actually, two more times. We have the two lowest.

We get this and then we, once again, add it to lowest, and we get this.

And once again, we, every time we add it to lowest and we move into the next

column, we'll reorder everything. When we end up with only two, we stop.

And we're going to see why we stop. Now, it's time to give the code.

So, basically, what I have done here, I basically redraw the graph that we have

in the previous slide. And so, this is the first column, this is

the second column, and the third column, the fourth column, and the fifth column.

And now, I'm going to go backwards, assigning codes.

So, basically to 0.6, I assigned zero. And to 0.4, I assigned one.

Before we go backwards, let me tell you what's happening.

When the decoder is going to see a zero, it's going to know that it's from the

group that led to this 0.6. When you see sub one, it's going to know

that it's from the group that leads to, lead to this 0.4.

Then we're going to have to keep adding bits to say which one in that group.

And that's why we keep going backwards. So, we basically propagate back the zero.

And we actually, the 0.6 came from 0.30.3, plus 0.3.

So, we add a zero and a one. So now we, of course, propagated 0.4 here

and basically assigned it to this that came to this column, without adding any

numbers. So, it keeps propagating without adding

any bits. So now, all the symbols that lead to this

group and we're going to see, it was actually only one symbol,

we have this code and all the symbols that propagated to this probability.

We have symbols that start, we have codes that start from 01.

So now, we keep propagating. We propagate the one to the one, the 00

to the 00. 01, we have to split because this 0.3

probability came from the addition of these two.

And then, we have 010, 011. And then we keep going.

The, we propagate the one to the one, the zero to the zero, this 010, which has a

code 011 propagates here. And this actually has to split.

Every time we split because when we were computing it, we were adding.

So, when we're going back, we were splitting.

Every time we are splitting, we have to add zero and one.

And so, we keep going, we keep going, and we get this result that says that the

symbol 01 is going to have a code, one. The symbol 06 is going to have a code 00

and so on. So, this is the code,

it's a variable length code. Not every symbol is represented by a code

of the same length. It actually has a different length and

the length depends on the probability. The higher the probability, the shorter

the code. And that's very easy to see because the

low probabilities were added, added, added, added and then, when we're going

back we have to split, split, split, split.

And remember, every time we split, we have to add a symbol, and we have to add

a bit for that symbol. So, basically the code becomes longer and

longer. And we can actually go and compute, once

again, the average length of these. So, to repeat the process, we basically

order the symbols according to the probability,

that's what we did here. We start adding low probabilities.

So, we add these two, we got this, we reorder and we keep going.

It's a recursive process so if you know how to go from one column to the next,

you know how to go all the way. And then, we come back.

There is a couple of things that are very important to observe here.

The code that you get in this way is what's called prefix-free.

And that's very, very important. Look at this column.

When you see a one, when the decoder sees a one, it knows that it has to go here,

to a2. When is this a zero, it basically says,

okay, I have to wait for the next symbol. But there is no code here, which is a

full prefix of any other code. And that's very important, otherwise, the

decoder won't be able to reconstruct. Let's give an example of something which

is not a prefix code. Let's say that the symbol a1 is

represented by the code one, the symbol a2 is represented by the code zero, and

the symbol a3 is represented by the code 01,

okay? So, when the decoder sees 01,

basically there are two possibilities. It's either a3,

let's say, pixel value 100 or is a2 followed by a1.

Let's say, pixel zero and pixel 255. So, there is a lot of ambiguity in the

reconstruction. We don't know which one it is.

And this is because this code is not what's called prefix-free.

This guy is a subset of this guy. That doesn't happen here.

Let's just clear these annotations. None of these codes,

so for example, the 00, you don't find it anywhere here.

So, when the decoder sees a one, knows what to do.

It puts the pixel value of A2. When it sees a zero.

Is it, oh, I'll have to wait, because there are many codes that starts with

zero. If this is a zero, another zero and I'll

say, oh, I'm done, I basically have a6. If, let's say, it's, so, the decoder

receives zero and then one, says, oh, I have to wait, because there are numerous

codes that start with 01. But then, it receives an, a one.

that says, oh, I'm done. I don't have to keep waiting.

Because there's nothing. There's no ambiguity here.

And that's very important, a Huffman code by construction.

Look how we basically, split, and split, and split.

And by construction, this is a prefix-free code.

Now, how do we know that this is an optimal code?

How do we, how do we know that with this set of probabilities,

the process, the procedure I just explained achieved the lowest possible

the, the lowest possible length, the shortest possible length of a code on

average. So, that's a theorem we have to prove,

we're not going to do that in this class, but as a theorem that, that shows the

Huffman coding basically achieves the shortest possible length.

Now, you might wonder what length can Huffman coding achieve or any basically

encoder of this tile. So, the basic idea is if I give you a set

of probabilities, how much can I compress?

How much, I go, and before I construct a Huffman code, I want to know, is it worth

the effort of constructing the code? Will I compress a lot?

And the way, the way to do that, is basically you compute what's called the

entropy. And the entropy,

which is written by H, H is entropy,

[SOUND] is the sum over all the symbols with a minus here, I'm going to explain

in a second, of the probability, of every symbol log base 2 of the probability of

every symbol. So, it's basically sum, 0.4 times log

base two of 0.4 plus 0.3 times log base two of 0.3 and so on.

So, this is actually, at the end, a positive number because all probabilities

are below one and therefore, the log of the probabilities are negative numbers

and then with the negative, we get the positive.

And Shannon's First theorem basically says, that we can achieve basically,

asymptotically, we can achieve this code length on average.

So, this is basically what's achievable with these types of codes, like Huffman

code. We basically can compute this number and

say, oh, that's going to be my average code length.

My expected average code length. And the, it's actually not so hard to

understand where this formula comes from. We used a similar computation with the

previous example. Here, the probability is basically the

probability of the symbol appearing. And this is actually the length of the

code that is going to be used for that symbol.

So basically, what this is saying is that the expected length of that code is the

log of the probability. And why it's only the expected is because

you cannot have a code which is 3.2 long. You have to send entire bits.

Either you send three bits or you send four bits.

And that's why this is on average the expectation,

what you expect to be the code length. But this basically gives you an idea of

what encoders at Huffman coding can achieve.

So, now that we know how to encode the bits that we are producing, we are ready

to go into the next blocks of image compression.