And the moral of the story, of course, is that this example exposes an interesting

algorithmic opportunity. So the obvious, quote unquote, solution

of a perfectly balanced search tree, need not be the best search tree when

frequency of access is non uniform. You might want to have unbalanced trees,

like this chain if it gets extremely frequent searches, closer to the roots to

have smaller search time. So the obvious question is then, given a

bunch of items and known frequencies of access, what is the optimal.

Search tree. Which search tree minimizes the average

search time? So that brings us to the formal problem

definition. We're told n objects that we gotta store

in a search tree and we're told the frequency of access of each.

So let's just keep things simple and the notation straightforward, let's just say

the items are named from one, two, all the way up to n, and that is also the

order of their respective keys. So key one is the frequency of searching

for the item with the smallest key, and so on.

You might wonder where these frequencies come from.

How would you know exactly how frequently every possible key will be searched for.

It's going to depend on the application. And you know there will be applications

where you're not going to have these kinds of statistics.

And that's where you'd probably want to turn to a general purpose balanced binary

search tree solution, something like a red black tree, which guarantees you that

every search is going to be reasonably fast.

But it's not hard to think about applications where you are going to be

able to learn pretty accurate statistics about how frequently different things are

searched for. One example might be something like a

spell checker. So if you would implement that by storing

all of the legally spelled words in a search tree, and as you're scanning a

document, and every time you hit a word, you look it up in the search tree to see

if it's correctly spelled or incorrectly spelled.

You can imagine that after, you know, scanning through a number of documents,

you would have pretty good estimates, about how frequently things get searched

for. And then you could use those estimates to

build a highly optimized binary search tree for all future documents.

If you're in some other kind of application where you're concerned about

these frequencies changing over time, so, for example, if they're privy to trends

in the industry, you could imagine rebuilding the search tree every day or

every week or whatever, based on the latest statistics that you've got.