Now in case you're feeling cocky about the fact that the greedy algorithm works

to solve the seemingly similar optimal prefix-free binary code problem in the

form of Huffman's algorithm, I want to spend a little time pointing out that

greedy algorithms are not sufficient, are not correct, to solve the optimal search

tree problem. So if we were to design a greedy

algorithm what kind of intuition would motivate a particular greedy rule.

Well, staying at the objective function it's very clear that we want the objects

that high frequency of access to be at or near the root and we want items of low

frequency access to be in the bottom levels of the tree, like the leaves.

So what are some ways we can compile this intuition into a Greedy algorithm?

Well one, perhaps motivated by the success of Huffman's algorithm, is we

could think about a bottom up approach. Now I'm not going to define what I mean

here precisely, but informally we want to start with the bottom most levels, with

the leaves and the nodes you want to put there are the objects that are accessed

the least frequently. Any reasonable way of implementing this

kind of body of greedy rule is not going to work.

Let me show a simple counter example. So, let's just assume we have four

objects, one, two, three, four. What I'm showing here on the right in

pink is two possible search trees valid for those four keys.

And let's assume that, the frequencies are as followed.

Object one is searched for two% of the time.

Object two for 23% of the time. Object three, the bulk of the time 73%.

An object for only one% of the time. Any greedy algorithm which insists on

putting the lowest frequency objects in the very bottommost level of the tree is

not going to produce this tree on the right, which has the two% object below,

at a lower level than the !% object. Instead, such a greedy algorithm would

presumably output a searchtree like the one on the left, which has the two as the

root, and then the object four, at the lowest level, at depth two.

But you should be able to easily convince yourself that, for these probabilities,

it's the tree on the right which is the one you want, that's optimal, because the

object three is the one that's searched for the bulk of the time, that's the one

you want at the root as opposed to the object two.

So I realize I'm being a little informal here but I hope you get the idea that a

naive bottom-up implementation of a greedy algorithm, which if you think

about it is really what we did in Huffman's algorithm, is not going to work

here. The same can be said about a top-down

approach. Perhaps the simplest top-down approach

would be just to take the most frequently searched for object and put that at the

root and then recursively develop an appropriate left and right sub-trees

under that most frequently accessed element.

So let me show you again and, formally, just the same kind of counter-example.

We're going to use the exact same four objects, the exact same two trees.

I, I will however, change the numbers. Now, let's imagine that object one is

searched for almost never, just one percent of the time and each of the other

three objects are searched for roughly a third of the time each.

But, let me sort of break ties, so that the object number two is the most

frequently one. Searched for 34%.

So that, in that case the Greedy algorithm will put the 34% node up in the

roots when really what should happen is you want a perfectly balanced sub-tree

for the objects two, three and four because each accounts for roughly a third

of the searches. So let's give object three 33% of the

searches and object four 32% of the searches.