So I know you're probably catching your breath from that last computation.

So let's zoom out. Let's make sure we don't lose the forest

for the trees. And see that we're actually mostly there.

We're just one lemma away from finishing the Proof of Theorem.

The proof of correctness of Huffman's algorithm.

So what do we have going for us. Well, first of all we've got the

inductive hypothesis. Remember and any proof by induction you better rely very

heavily on the inductive hypothesis. What does it say?

It says when Huffman's algorithm recurses on the smaller subproblem with the

smaller alpha bit signal prime, it solves it optimally.

It returns a tree which minimizes the average encoding length with respects to

the alpha bit sigma prime in the corresponding frequencies.

So, we're going to call that tree, the output of the recursive call, T prime

hat. So let me amplify the power of this

inductive hypothesis by combining it with the computation we did in the last slot.

So what do we know so far? We know for this smallest subproblem

which frankly, we don't care about in its own right.

But, nevertheless, for this smaller sub problem, with alphabet sigma prime, the

recursive call solves it optimally. It returns to us, this tree T hat prime.

And among all trees with leaves inable according to sigma prime.

This tree is as good as it gets, it minimizes the average encoding length.

But what have we learned in the last slide?

We learned that there's a one to one correspondence between feasible

solutions, between trees for the smallest subproblem with the alphabet sigma prime,

and feasible solutions to the original problem, the one we care about, that have

a certain form in which, it just so happens that A and B are siblings, that

they share a common parent. Moreover, this correspondence preserves

the objective function value, the average encoding length or at least it preserves

it up to a constant which is good enough for our purposes.

So the upshot is, in minimizing average encoding length

over all feasible solutions for the smallest subproblem, a recursive call is

actually doing more for us. It's actually minimizing the average

encoding length for the original problem with the original alphabet sigma over a

subset of the feasible solutions, the feasible solutions in which A and B are

siblings. Now, does this do us any good?

Well, it depends, what was our original goal?

Our goal was to get the smallest average encoding if possible over all feasible

solutions in the world. And so what have we just figured out?

We figured out, well we're getting the best possible scenario amongst some

solutions. Those in which A and B happen to be

siblings. So, we're in trouble if there is no

optimal solution in which A and B are siblings.

Then it doesn't do us any good to optimize only over these crappy

solutions. On the other hand, if there is an optimal

solution lying in this set XAB in which A and B are siblings, then we're golden.

Then there's an optimal solution in this set, while recursive call is finding the

best solution in the sets. So, we're going to find an optimal

solution. So that's really the big question.

Can we guarantee that it was safe to merge A and B in the very first

iteration? Can we guarantee there is an optimal solution, a best possible tree

that also has that property? That also has A and B as siblings.

So the key level, which I approve in the next slide in which we'll complete the

proof of the correctness of the apartment spherum, is indeed there is always an

optimal solution in the set XABA an optimal solution on which A and B, the

two lowest frequency symbols are indeed siblings.

So I'll give you a full-blown proof of this Key Lemma in the next slide.

But let me first just orientate you and you know, just develop your intuition

about why you'd hope this Key Lemma would be true.

The intuition is really the same as that as when we developed the greedy algorithm

in the first place, namely when you which symbols do you want to incur the longest

encoding lengths? Well, you want them to be the lowest

frequency symbols, that you want to minimize the average.

So if you look at any old tree, there's going to be you know, some

bottommost layer. There's going to be some deepest possible

leaves. And, you really hope that A and B, the

lowest frequency leaves are going to be in that lowest level.

Now they might both be in the lowest level and not, and might not be siblings.

But, if you think about it, amongst symbols at the same level, they're all

suffering exactly the same encoding length anyways.

So you can permute them in any way you want and it's not going to affect your

average encoding length. So once you put A and B at the bottommost

level you may as well put them as siblings and it's not going to make any

difference. So, the final punchline being, you can

take any tree like say an optimal tree. And by swapping.

Changing the lowest frequency elements A and B to the bottom and to be common

siblings, you can't make the tree worse you can only make it better, and that's

going to show that there is an optimal tree with the design property where A and

B are in fact siblings. All right, so that's a reasonably

accurate intuition, but it's not a proof. Here is the proof,