0:00

So, in this video, we're gonna take it at least partway to the next level for the

heap data structure. That is, we'll discuss some of the implementation

details. I.e., how would you code a, a heat data structure from scratch? So

remember what the point of a heap is. It's a container, and it contains objects, and

each of these objects, in addition to possibly lots of other data, should have

some kind of key. And we should be able to compare the keys of different objects; you

know, Social Security numbers for different employers, edge weights for

different edges in a network, or time stamps on different events, and so on. Now

remember, for any data structure the number one thing you should remember is

what are the operations that it supports, i.e., what is the data structure good for.

And also, what is the running time you can count on of those operations. So something

we promised in a previous video. [inaudible] Indicate the implementation of

the miss video is these two primary operations that a heap exports. First of

all, you can insert stuff into it, and it takes [inaudible] logarithmic in the

number of objects that the heap is storing. And second of all, you can

extract an object. That has the minimum key value, and again we're going to allow

duplicates in our, in our heaps, so if there's multiple objects, then I'll have a

minimum, a common minimum key value, the heap will return one of those. It's

unspecified, which one. As I mentioned earlier, you can also dress up heaps with

additional operations, like you can do batch inserts, and you can do linear time

rather than log in time. You can delete from the middle of the heap. I'm not gonna

discuss those in the video; I'm just going to focus on how you'd implement inserts

and extract [inaudible]. If you wanna know how heaps really work, it's important to

keep in mind simultaneously Two different views of a heap One, as a tree And one, as

a, array. So we're gonna start on this slide with the tree view. Conceptually,

this will be useful to explain how the heap ope rations are implemented. A

conceptual will think of a heap not just as any old tree, but as a tree that's

rooted. It'll be binary. Meaning that, each node will have zero, one or two

children nodes And third, it will be as complete as possible. So let me draw for

your amusement an as complete as possible binary tree that has nine nodes. So if the

tree had, had only seven nodes it would have been obvious what, is complete as

possible means. It would have meant we just would have had three completely

filled in levels. If it had, had fifteen nodes it would have had four completely

filled in levels. If you're in between, these two numbers that are powers of two

minus one, well we're going to call a complete to the tree as just in the bottom

level you fill in the leaves from left to right. So here the two extra leaves on the

fourth level are both, pushed as far to the left as possible. So, in our minds,

this is how we visualize heaps. Let me next define the heat property. This

imposes an ordering on how the different objects are arranged in this tree

structure. The heap property dictates that at every single node of this tree it

doesn't matter if it's the root if it's a leaf if it's an internal node whatever. At

every node X the key of the object stored in X should be no more than the keys of Xs

children. Now X may have zero children if it's a leaf it may have one child or it

may have two children whatever those cases zero one or two children all those

children keys should be at least that of key at X. For example, here is a heap with

seven nodes. Notice that I am allowing duplicates. There are three different

objects that have the key value four, in this heap. Another thing to realize is

that while the heap property imposes useful structure on how the objects can be

arranged it in no way uniquely pins down their structure. So this exact same set of

seven keys could be arranged differently and it would still be a heap. The

important thing is that, in any heap, the root has to have a minimum value key. Just

like in the se two organizations of these seven keys, the root is always a four, the

minimum key value. So that's a good sign, given that one of the main operations

we're supposed to. Quickly implement, is to extract the minimum value. So at least

we know where it's going to be, it' gonna be at the root of a heap. So while in or

minds we think of heaps as organized in a tree fashion, we don't literally implement

them as trees. So in something like search trees, you actually have pointers at each

node and you can traverse pointers to go from a [inaudible] to the children from

the children to the parents, yada, yada, yada. Turns out it's much more efficient

in a heap to just directly implement it as an array. Let me show you by example how a

tree like we had on the last slide maps naturally onto an array representation. So

let's look at a slightly bigger heap, one that has nine elements. So let me draw an

array with nine positions. Labeled one, two, three all the way up to nine And the

way we're going to map this tree which is in our mind to this array implementation

is really very natural. We're just going to group the nodes of this tree by their

level. So, the root is gonna be the only node at level zero. Then the children of

the roots are level one, their children constitute level two, and then we have a

partial level three, which is just these last two notes here. And now we just stick

these notes into the array, one level at a time. So the roots winds up in the

privileged first position, so that's going to be the, the first, the object which is

the first copy of the four. Then we put in the level one object, so that's the second

copy of the four and the eight, and then we put in level two Which has our third

four along with the two nines. And then we have the last two notes from level three

rounding out the penultimate and final position of the array. And you might be

wondering how it is we seem to be having our cake and eating it, too. On the one

hand we have this nice, logical tree structure. On the other hand we have this

array implementation and we're not wasting any space on the usual pointers you would

have in a tree to traverse between parents and children. So where's the free lunch

coming from? Well the reason is that because we're able to keep this binary

tree as balanced as possible, we don't actually need pointers to figure out who

is whose parent and who is whose child. We can just read that off directly From the

positions in the array. So, let me be a little bit more specific. If you have a

node in the fifth position, I'm assuming I here is not one, right? So the, the root

doesn't have any, does not have a parent But any other, any other objects in

position I does have a parent And what the position that is, depends on, in a simple

way on whether I is even or odd. So if I is even, then the parent is just

[inaudible] the position of I/2, and if I is odd, then it's going to be I/2. Okay,

that's a fraction. So we take the floor that is we round down to the nearest

integer. If I is odd, So for example, the objects in positions two and three have as

their parent the object in position one, and those in four and five have the one in

position two as his parent. Six and seven have as their parents the node in, the

object in position three and so on And of course we can invert this function we can

equally easily determine who the children are of a given node so if we have an

object in position I. Then you notice that the children of I are gonna be at the

position 2i and 2i+1. Of course those may be empty so if you have a leaf of course

that doesn't have any children and then maybe one node that has only one child.

But in the common case of an internal node it's gonna be two children that are gonna

be in positions 2i and 2i+1. So rather than traversing pointers it's very easy to

just go from a node to its parent to either one of its children just by doing

these appropriate trivial calculations with respects to its position. So this

slide illustrates, some of the. Lower level reasons that heaps are quite a

popular data struct ure So the first one, just in terms of storage. We don't need

any overhead at all. We are just. We have these objects; we're storing them directly

In an array, with no extra space. Second of all, not only do we not have to have a

space for pointers But we don't even have to do any traversing. All we do are these

really simple. Divide by two or multiply two operations And using bit shifting

tricks. Those can also be implemented extremely quickly. So, the next two slides

let me indicate, at a high level, how you would implement the two exported

operations, namely insertion and extract [inaudible] in time algorithmic in the

size of the heap And rather than give you any pseudo code, I'm just going to show

you how these work by example. I think it will be obvious how they extended the

general case. I think just based on this discussion, you'll be able to code up your

own versions, of insert and extract [inaudible], if you so desire. So let's

redraw the 9-node heap that we had on the previous slide and again, I'm gonna keep

drawing it as a tree and I'm gonna keep talking about it as a tree but always keep

in mind the way it's really implemented is in terms of these array and when I talk

about the parent of a node, again, what that means is you go to the appropriate

position given the position of the node from where you started. So let's suppose

we have an existing heap, like this blue heap here And we're called upon to insert

a new object. Let's say with a key value K. Now remember heaps are always suppose

to be perfectly balanced binary trees. So if we want to maintain the property that

this tree is perfectly balanced is pretty only one place we can try to put the new

key K and that's as the next leaf. That is it's going to be the new right most leaf

on the bottom level. Or in terms of the array implementation we just stick it in

the first non empty slot in the array And if we keep track of the array size we're

getting constant time of course know where to put the new key. Now whether or not we

can get away with this depends on what the actual key value K is But, you know, for

starters we can say, what if we insert a key that's seven? Well, then we get to

say, whew, we're done. So, the reason we're done is cuz we have not violated the

heap property. It is still the case that every node has key no bigger than that of

its children. In particular, this third copy of a four, it picked up a new child,

but its key seven was bigger than its key four. So, you can imagine that maybe we

get lucky with another insert. Maybe the next insertion is a ten And again, we put

that in the next available spot in the last level And that becomes the second

child of the third copy of the four And again we have no violation of the heap

property. No worries still got a heap And in these lucky events insertion is even

taking constant time. Really all we're doing is putting elements at the end of an

array and not doing any rearranging. So where it gets interesting is when we do an

insertion that violates the heap property. So let's supposed we do yet another

insertion, and the left child of this twelve, it becomes a five. Now we got a

problem. So now we have a as perfectly as possible balanced binary tree, but the key

property is not satisfied. In particular, it's violated at the node twelve. It has

one child. The key of that child is less than its own key. That's no good. So is

there some way we can restore the heap property? Well, a natural idea is just to

swap the positions of the five and the twelve, and that's something that of

course can be done in constant time, cuz again from a node, we can go to the parent

or the child in constant time just with a suitable trivial computation. So we say

okay, for starters we put the five at the end, but that's no good. So we're gonna

swap the five with the twelve And now we see we're not out of the woods. No longer

is there a heap violation at the node twelve. That's been fixed. We've made it a

leaf But we've Pushed up the heap violations. So now instead it's the eight

that has a problem. The eight used to h ave two children, with keys twelve and

nine that was fine. Now the eight has two children with keys five and nine. The five

is less than the eight, that's a violation of the heap property But again, that's the

only violation of the heap property. There's no other node you could have

screwed up, because eight was the only person whose children we messed around

with. Alright So now we just it again. Let's try [inaudible] again, locally fix

the heap violation by swapping the five with the 8s And now we see we've restored

order. The only place where there could possibly be a violation of the heap

property is at the root. The root, when we did this swap, the only person whose

children we really messed with was the root four, and fortunately its new child

has key five, which is bigger than it. One subtle point that you might be thinking

that in addition to screwing up at the root node, that messing around with his

children, maybe we could have screwed up the twill by messing around with its

parent. Alright, its parent used to be five, and now its parent is eight. So is

there some possibility that his parent would all of a sudden have a key bigger

than it But if you think about it, this eight and this twelve, they were a

parent-child relationship in the original heap Right? So back in the blue heap, the

twelve was under the 8s. Now the twelve is under the eight yet again. Since we have

the heap property for that pair before, we certainly have it now. So in general, as

you push up this five up the tree, there's only going to be one possible edge that

could be out of order And that's between where the five currently resides and

whatever its parent is. So when the 5's parent was twelve that was a violation

When 5's parent was eight that was a violation But now that we've pushed it up

two levels and 5's parents is four, that's not a violation because four is less than

five. So in general, step two of insertion is, you do this swap, which it's called a

lot of different things. I'm gonna call it bubble up because that's how I learned it

more years ago than I care to admit But also this is called, sometimes sift up,

happily up, and so on. So now told you to just how to implement insertions by

repeated bubbling ups in a heap data structure And this is really how it works,

there is nothing I haven't told you but you know, I'm not going to really fill in

all the details but I'll encourage you to do that on your own time, if it's

something that interests you And the two main things that you should check is first

of all, is bubbling up process is gonna stop and when it stops, it stops with the

heap property restored. The second thing needs to be checked in this, I think is

easier to see is that we do have the desire one time log [inaudible] make in

the number of elements in the heap. The key observations areas that because this

is a perfectly balanced binary tree. We know exactly how many levels there are. So

this is basically log based two event levels where n is the number of items in

the heap And what is the running time of this insertion procedure while you only do

a constant amount of work at each level, just doing the swap and comparison and

then in the worst case, you'll have to swap at every single level and there is a

lot [inaudible] number of levels. So that's insertion. Let's now talk about how

do we implement the extract [inaudible] operation and again I'm gonna do this by

example and it's gonna be by repeating the [inaudible] of a bubble [inaudible]

procedure. So the Extract main operation is responsible for removing from the heap

an object with minimum key value and handing it back to the client on a silver

platter. So it pretty much have to whip out the root. Remember the minimum is

guaranteed to be at the root. So that's how we have to begin to extract the

subroutine is just we pluck out the root and hand it back to the client. So this

removal of course leaves a gaping hole in our tree structure and that's no good. One

of the [inaudible] responsible for maintaining is that we always have an as

perfectly balanced as possib le binary tree And if you are missing a root you

certainly don't have an almost perfect. Binary tree So, what are we going to do

about it? How do we fill this hole? Well, there's pretty much only one node that

could fill this hole without causing other problems with the tree structure, and that

is the very last node. So the rightmost leaf at the bottom level one that simple

fix is to swap that up and have that take the place of the original root. So in this

case, the thirteen is going to get a massive promotion And get teleported all

the way to be the new root of this tree. So now we've resolved our structural

challenges. We now again have a, as perfectly balanced as possible, binary

tree But of course now we've totally screwed up the heap property, right. So

the heap property says that at every node, including the root, the key value at that

node has to be less than both of the children, and now it's all messed up.

Right, so at the root, the key value's actually bigger than both of the children.

And matters that are little bit more tricky than they were with insertion,

right, when we inserted at the bottom because every note has a unique parent. If

you wanna push a note upward in the tree, there's sort of only one place you can go,

right, all you can do is swap with your parent, unless you're going to try to do

something really crazy But if you want to do something local, pretty much you only

have a unique parent you got to swap with. Now when you're trying to push notes down

to the rightful position in the tree, there is two different swaps you could do,

one for the left child, one for the right child and the decision that we make

matters To see that concretely, let's think about this example. There's this

thirteen at the root, which is totally not where it should be, and there's the two

children. The four and the eight, and we could try swapping it with either one. So

suppose we swap it in a misguided way with the right child, with the eight. So now

the eight becomes the root, and the thirteen gets pushed dow n a level. So on

the one hand; we made some progress because now at least we don't have a

violation between the thirteen and the eight. On the other hand, we still have

violations involving the thirteen. The thirteen is still violated with respect to

the twelve and nine And moreover, we've created a new problem between the eight

and the four, right? So now that the eight is the root, that's still bigger than its

left child, this four. So it's not even clear we made any progress at all when we

swapped the thirteen with the eight, okay? So that was a bad idea And if you think

about it would made it a bad idea, the stupid thing was to swap it with the

larger child. That doesn't make any sense. We really want to swap it with the smaller

child. Remember, every node should have a key bigger than both of its children. So

if we're going to swap up either the four or the eight, one of those is going to

become the parent of the other. The parent is supposed to be smaller, so evidently we

should take the smaller of the two children and swap the thirteen with that.

So we should swap the thirteen with the figure. Not with E And now we observe a

phenomenon very much analogous to what we saw in insert. When we were bubbling up

during insertion, it wasn't necessarily that we fixed violations of the heat

property right away But we would fix one And then introduce another one that was

higher up in the tree And we had confidence that eventually we could just,

push this violation up to the root of the tree And squash it, just like we're trying

to win a game Of Whack a Mole. Here, it's the opposite. It's just in the opposite

direction. So we swap the thirteen with the four. It's true we've created one new

violation of the heap property. That's again involving the thirteen with its

children nine and four. But we haven't created any new ones. We've pushed the

heap violation further down the tree And hopefully again, like in Whack a Mole.

We'll squash it At the bottom. So after swapping the thirteen and the four, now we

just gotta do t he same thing. We say, okay, we're not done. We still don't have

a heap. This thirteen is bigger than both of its children But now, with our

accumulated wisdom, we know we should definitely swap the thirteen with the

four. We're not gonna try swapping with the nine, that's for sure. So we move the

four up here And we, the thirteen takes the 4's old place. Boom! Now we're done.

So now we have no violations remaining. The thirteen in its new position has no

children so there's no way it can have any violations, and the four because it was

the smaller child that's gonna be bigger than the 9's so we haven't introduced a

huge violation there, and again we have these consecutive 4's but we know that's

not gonna be a problem because those were consecutive 4's in the original heap as

well. So you won't be surprised to hear that this procedure by which you push

something down, by swapping it with its smaller children, is called bubble down,

and extract men is nothing more than taking more than, taking this last leaf,

promoting it to the top of the tree, and bubbling down until the heap violation has

been fixed. So again on a conceptual level that's all of the ingredients necessary

for a complete from scratch implementation of extracting the minimum from a heap and

as before, I'll leave it for you to check the details. So first of all you should

check that in fact this bubble down has to at some point halt And when it halts you

do have a bona fide heap. The heap property is definitely restored And second

of all the running time is, is logarithmic. Here the running time

analysis is exactly the same as before so we already have observed that the heights

of a heap because it's perfectly balanced is essentially the log base two of the

number of elements in the heap and in bubbling down all you do is a constant

amount of work per level. All you have to do is a couple comparisons and swap. So,

that's a peek at what's under the hood in the heap data structure. A little bit

about the guts of its elementation. So after having seen this, I hope you feel

like a little bit more hard-core of a programmer, a little bit more hard-core of

a computer scientist.