The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

來自 Stanford University 的課程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

467 個評分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

從本節課中

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

So just like in the Hopcroft Ullman analysis we're going to think about the

total amount of work done by our union fine data structure as constituting 2

pieces, work done visiting good nodes and work done visiting bad nodes.

The previous quiz says that we have a bound on visits to good nodes, even on a

per operation basis. At most inverse acronym for N, good nodes

are visited every single operation. What remains is to have a global bound on

the number of visits to bad nodes. The argument will be to show, that over

an arbitrary sequence of m find and union operations, the total number of visits to

bad nodes is bounded above by big O of n, times, alpha of n.

So here is the crux of the argument. Here is why, when you do a visit to a bad

node, the subsequent path compression massively increases the gap between that

object's rank and the rank of the parent. So, let's freeze the data structure at

the moment where we found the operation and makes a visit to a bad object,

call that Bad Object X. Let's think about what the world with the

data structure must look like in this scenario.

So we've got our bad object X, it may or may not have nodes pointing to

it. We're not going to care.

By virtue of it being bad and meeting the third criterion we know the rank of X is

at least 2. It's delta value could be anything.

Whatever it is, let's call it. Okay.

So because x is not a root, it has some parent, call that parent p.

And not only does x have ancestors, but because it meets the fourth criterion, we

actually know it has an ancestor y who has the same delta value as x.

That is it has an ancestor y With delta of y equal to k.

It is possible that p and y are the same object ir they could be different.

It's not going to matter in the analysis. So we're calling that the statistic delta

is only definied for non roots, we can conclude that y is also not the

root. It must then have some parent p prime.

P prime might be a root or it might not, we don't care.

[SOUND] So, now let's understand of the effect of the past compression, which is

going to happen subsequent to this find operation.

X's parent points is going to be rewired to the root of this tree.

The root of this tree is either at P prime, or even higher than that.

Given that fact let's understand how much bigger the rank of X's new parent P

primer higher is compared to the rank of it's old parent P.

So the rank of x's new parent is at least the rank of P prime.

So if p' is in fact the root, then the rank of x's new parent is just the rank

of p'. Otherwise, this new parnet is even higher

than P prime because its ranks only increase going up the tree, that means it

would only be higher than the rank of P prime.

How does the rank of P prime compare to its child, that of its child y? Well and

here is a key point, because delta of y is equal to k.

Remember what the definition of delta is, it quantifies the gap between the rank of

an object, and that of its parent. We are going to use that here for y, and

its parent P prime. It means the rank of P prime is so big

It's at least the function, a sub k applied to y's rank.

So our third and final inequality is the same as the first one.

So it could be the case that y actually is the same as p, it actually is x's old

parent. In that case the rank of x's old parent

is precisely the rank of y. Otherwise, y is even higher up X's old

parent P and therefore, by the monotonicity of rank, the rank of Y is

even bigger, than the rank of X's old parent.

So now in this chain of inequalities, I want you to focus on the left-most

quantity, and the right-most quantity. What does this say, at the end of the

day? It says when you apply path compression to X, it acquires a new

parent. And the rank of that new parent is at

least the A sub K function applied to the rank of it's old parent.

Again, path compression at the very least applies the A sub K function to the rank

of X's parent. So now let's suppose you visit some bad

object X, not just once, but the same object X, while it's bad over and over

and over again. This argument says that every such visit

to the object x while it was bad applies the function A sub k to the rank of its

parent. So in particular, let's use r to denote

the rank of this bad object x and, again, by virtue of it being bad, r has to be at

least 2. Imagine we do a sequence of R visits to

this object X while it's bad. Each of those visits will result in it

acquiring a new parent. And that new parents rank is much bigger

than the rank of the previous parent. It's at least A sub K, applied to the

rank of the previous parent. So, after R visits,

that's applying the function A sub K, R times to the rank of X's original parent

which forces rank at least that of X, at least R.

Well we have another name for applying the function A sub K, R times to R.

Remember this is just by definition of the acronym function A sub K plus 1.

Applied to r. So what does this mean? Well this means,

that after r visits, to a bad object x, that has rank r, the rank of x's parent

has to have grown, so much, that that growth is reflected in our statistic

delta, that measures the gap in the rank, between objects, and the objects parent.

So remember how we defined delta of x. It's the largest value of k so that the

rank of x's parent is even bigger than a sub k applied to the rank of x.

So what this inequality shows is that every r Parent pointer updates to x allow

you to take k to be even bigger than before.

You can bump up k by 1 and still have this inequality be satisfied.

That is, every r visits to a bad object of rank r, delta has to increase by at

least 1. But there's only so many distinct values

of the statistic delta of X can take on. It's always non negative, it's always an

integer, it can only increase, and it's never bigger than the inverse Ackermann

function alpha of N. So therefore, the total number of visits

to an object X, while it's bad, over an arbitrary sequence, can not be more than

R, the number of visits needed to increment delta of X,

times the number of distinct values, which is alpha of N, the inverse

function. So now we've done all the hard work.

We're almost there, we just need 1 final slide putting it all together.

All right, so to see how all of this comes together, let's first recall that

all that we need to do is bound the total number of visits to bad objects over this

entire sequence of union and find operation.

So in the previous slide, we showed that for a bad object x, with rank r, the

total number of times it's visited while it's bad, is bounded above by r times the

inverse Ackermann function of n. So lets sum that over all of the objects

to get our initial bound on the total number of visits to bad objects.

So now, we have to be a little careful here, because there are n objects x.

and ranks can be as large as log n. So a naive-bound would give us something

like n times LogN times alpha of n, and that's not what we want.

We really want n times alpha of n. So we need to use the fact that not all

big nodes can have big rank. And that's exactly what the rank Lemma

says. So to rewrite this, in a way that we can

apply the rank Lemma, bounding a number of objects with a given rank, lets bucket

the objects according to their rank. So rather than summing over the objects,

we're going to sum over possible ranks r, and then we're going to multiply times

the number of objects that have that rank R.

While I'm at it, let me pull the alpha of n factor, which doesn't depend on the

object out in front of the sum. For every value of R, the rank Lemma

tells us there are at most N/(2^R) objects with that rank.

So this factor of N is independent of the rank R, so let me pull that out in front

of the sum. And when the dust settles we get n times

the inverse acronym function of n times the sum of non-negative integer r of r

divided by 2 to the r. So, we've seen this kind of sum without

the r in the numerator, just a geometric sum 1/2^r.

We know that's bounded by a constant, and more generally throwing an r in the

numerator, well that's no match for this exponentially decreasing denominator.

So, this sum still evaluates to, a universal constant.

I'll let you check that in the privacy of your own home.

And so that's, give us a bound of O(n) * the inverse Ackermann function of n, on

the total number of visits to bad objects, since we also have a per

operation bound of alpha(n) on the number of visits to good nodes.

Combining the work of the two cases we get O(n) m+n times inverse Ackermann

function of n. And again, the interesting case here is

when m is omega of n. Otherwise, you can just do this analysis

separately for each 3 in the data structure.

And, there you have it. You now understand in full detail one of

the most amazing results in the history of algorithms and data structures.

So, Tarjans bound is unimagenably close to being linear without actually being

linear, it's off by this inverse Acromon function factor.

Now, from a practical perspective for any imagenable value of N, alpha of N is

almost 4, so this gives you a linear time bound for Imaginable values of n.

In fact, even the Hopcroft-Ullman log starbound, logs stars at most 5, for any

imaginable value of n. So that also is in, in essence linear

time bound for all practical purposes. But theorotically speaking, we have not

proven a linear time bound, on the amount of work done by the union find Data

structures. So the Pavlovian response, of any

algorithms researcher worth their salt, would be to ask, can we do better? And,

we could ask that question in a couple different senses.

The first sense would be; well maybe we could have a sharper analysis, of this

exact data structure. Maybe, union by rank, and path

compression is sufficient To gaurantee a linear amount of work.

After all, we didn't need to change the data structure, we only needed to change

the analysis, to sharpen the log* bound, to an alpha of n bound.

So this question, remarkably, Tarjan answered in the negative, in his original

paper. He showed that, if you use this exact

data structure, union by rank and path compression, then they are arbitrarily

large Sequences of unions and finds of arbitrarily large collections of objects,

so that this data structure actually performs asymptotically, M the number of

operations times the inverse Ackermann function, N the number of objects amount

of work. That is keeping the data structure fixed,

this analysis cannot be asymptotically improved.

This data structure fundamentally has Worst case performance, governed by this

insane inverse Ackermann function. So this is already a mind boggling fact

that indeed Tarjan in the conclusion of his paper, notes that it's remarkable to

have such a simple algorithm with such a complicated running time.

But you could also ask the question, could we do better perhaps by improving

or changing the data structure. After all, by adding path compression, we

got a qualitative improvement in the average operation time.

That dropped from log to alpha of n. Perhaps there's yet another optimization

out there waiting to be discovered that would drop the amount of work per

operation over a sequence down to constant time per operation, linear work

overall. Tarjan, in his paper made the bold

conjecture that there is no linear time method, no matter how clever you are.

And remarkably, that conjecture has since been proved.

It was proved for certain classes of data structures both in Tarjan in his original

paper and in And a follow-up paper in '79.

But the final version, of this conjecture, was proven by Friedman and

Sachs, back in 1989. No matter how clever you are, no matter

what union-find data structure you come up with, there will be arbitrarily long

sequences of operations, so that the average amount of work you do per

operation Is, the inverse Ackerman function of, the number of objects, in

the data structure. Pretty unbelivable,

so let me just wrap up with a historical comment.

So, it's a full disclosure, I wasn't quite alive when this result came out

but, in talking to, in reading about it, and talking to senior researchers about,

my sense is that it was really sort of a water-shed moment, for the field of data

structures and And algorithms. Specifically it confirmed the belief of,

people working in algorithms and data structures, that the field posessed

surprising depth, and beauty. There had, of course, been earlier

glimpses of this, we mentioned in the optional, material in part 1, Kanuth's

analysis of linear probing, back in the early 1960's.

But this was really something for the worst case analysis, of algorithms and

data structures. So the fact that such a practical and

naturally arising problem, in algorithms and data structures, requires,

necessarily, the understanding of the Ackermann function and it's inverse.

A function, mind you, which was first Proposed and defined back in 1920s in the

service of recursion theory, almost 10 years before Tarjan was doing his work on

models of computation. It was a real eye opener and it showed

that this field is something that will keep many generations of scientists quite

busy for many years to come.