The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

439 ratings

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

So this algorithm will prove the correctness of Kruskal's minimum cost

spanning tree algorithm. So to prove this correctness theorem,

let's fix an arbitrary connected input graph G.

And let's let T star denote the output of Kruskal's algorithm when we invoke it on

this input graph. So, just like with our high level proof

plan for Prim's algorithm, we're going to proceed in three steps.

We're first going to just establish the more modest goal, the Kruskal's algorithm

outputs a spanning tree. We're not going to make any initial

claims about optimality. So those are the first two steps, one to

argue there's no cycles, one to argue that the output's connected.

And then, in the third step, relying on the cut property.

We'll say, it's not just a spanning tree, it's actually the minimum cost spanning

tree. For this proof, I am going to assume

you're familiar with the machinery we developed to, to prove Prim's algorithm

is correct. So the first step of the proof is going

to be very easy. So one important property of a Spanish

[INAUDIBLE] there's no cycles and it's quite obvious looking at the pseudocode

of Kruskal's algorithm that its not going to produce any cycles.

Every edge that creates a cycle is explicitly excluded from the output.

What's a little less obvious is that as long as the, input graph is connected,

then Kruskal's algorithm will output a connected sub-graph.

And therefore a spanning tree. So to argue the output's connected, we're

going to have two sub-parts of the proof. The first thing I want you to do is

recall what we call the empty cut lemma. So this was a way of talking about when a

graph is disconnected or equivalently when a graph is connected.

And you'll recall a graph is connected if and only if for every cut there's at

least one crossing edge. So to prove this second step of this

proof that T star is connected we just have to prove that for every single cut

it had at least one edge crossing that cut.

So, that's what we're going to show. To get started let's fix an arbitrary cut

A, B. Using the assumption the input graph is

connected, it certainly contains at least one edge that crosses this cut.

The question is whether or not T star contains a crossing edge but certainly

the original graph G contains a crossing edge.

So, here's the key point of the argument. Kruskal's algorithm, by definition, it

makes a single scan through all of the edges.

That is, it considers every edge of the original input graph exactly once.

So, what I want you to do is, I want you to think about this cut A, B which has at

least one edge of G crossing. And I want you to fast forward Kruskals

algorithm until the moment that it first considers an edge crossing this cut AB.

The key claim is that this first edge seen by Kruskal's algorithm, is

definitely be, going to be included in the final output, T star.

So why is this true? Well, let me remind you of the lonely cut

corollary. The lonely cut corollary says that if an

edge is the sole edge crossing a cut, if it's lonely in a cut, then it cannot

participate in any cycle, right?

If it was in a cycle, that cycle would loop back around and cross the cut a

second time. Now, what does that have to do with the

picture here? Well, if this is the first edge that

Kruskal sees crossing this cut, then certainly the tree so far T star contains

nothing crossing this cut, right? It hasn't even seen anything crossing

this cut yet. So at the moment it encounters the first

edge, there's no way including that first edge will create a cycle because that

first edge is going to be lonely in this cut AB.

So again summarising, the first edge crossing a cut is guaranteed to be chosen

by a cross because the algorithmic cannot create a cycle.

That's why there's at least one edge of Kruskal's output crossing this particular

cut AB. Since AB was arbitrary, all edges,

all cuts have some edge of T star crossing them, that's why T star is

connected. So this completes the part of the proof

where we argue that Kruskal's algorithm outputs a spanning tree.

Now we have to move on and argue that it's actually a minimum cost spanning

tree. We're going to argue this in the same way

as we did for Prim's algorithm. We're going to argue that every edge ever

selected by Kruskal's algorithm is justified by the cut property.

Satisfies the hypotheses of the cut property.

Recall from our correctness proof for Prim's Algorithm,

this is enough to argue correctness. That the output is a minimum cost

spanning tree, right?

An edge justified by the cut properties, not a mistake.

It's got to belong to the minimum cost spanning tree.

If you have an algorithm that outputs a spanning tree, and it never made a

mistake, that's got to be the minimum cost spanning tree.

And that's going to be the case for Kruskal.

Now this statement was easy to prove for Prim's algorithm and that's because the

definition of Prim's algorithm, it selects edges based on being the cheapest

in some cut. So it's tailor made for the application

of the cut property. Not so for Kruskal's algorithm.

If you look at the pseudocode, nowhere does the pseudocode discuss taking cheap

edges across cuts. So we have to show that Kruskal's

algorithm in effect is inadvertently at every edge picking the cheapest edge

crossing some cut. And we actually have to exhibit what that

cut is in our proof of correctness. So, that's what we have to do here.

So to argue that, let's just sort of freeze Kruskal's algorithm at some

arbitrary iteration, where it adds a new edge.

We need to show that this edge is justified by the cut property.

So, maybe this edge has end points U and V, and let's let capital T denote the

current value of the edges selected by the algorithm so far.

So let's think about what this state of the world looks like, at a, sort of,

intermediate iteration of Kruskal's algorithm.

So Kruskal's algorithm maintains the invariant there's no cycles but remember

it doesn't maintain any invariant of the current edges forming a connected set so

in general in an intermediate iteration of Kruskal's algorithm, you've got a

bunch of pieces, a bunch of little mini trees floating around the graph.

Connect the components if you like, with respect to the current edge shed, capital

T. And there can be some, you know, isolated

vertices floating around as well. What more can we say?

Well, if the current edge has endpoints U and V and Kruskal is deciding to add this

edge UV to the current set capital T, we certainly know that capital T has no path

between U and V, right?

If it did have a path, then this new edge would create a cycle, then Kruskal would

skip the edge. Since it isn't skipping the edge, U and V

are currently in different pieces. They're in different components with

respect to the current edge set. Now remember, ultimately, if we're going

to apply the cut property, we have to somehow exhibit some cut from somewhere,

justifying this particular edge. So here's where we get the cut from, and

it's going to be a very similar argument to when we proved the empty cut limit

characterizing disconnectedness in terms of empty cuts.

We're going to say look, with respect to the tree edges we have so far, with

respect to the green edges, there's no path from U to V.

That means we can find a cut such that U is on one side, V is on the other side,

and there's no edges crossing the cut. So, here's the cool part of the proof.

And this is also the part where we actually use the fact that Kruskal was a

greedy algorithm. That it considers the edges from cheapest

to most expensive. Notice that we haven't actually used that

fact up to this point and we better use that fact.

So, the claim is that not only is this edge we're adding UV, not only does it

cross this cut AB, but actually it's the cheapest edge crossing the cut.

Nothing from the original input graph G that crosses it can be cheaper.

So why is that true? Well, let's remember how we wrapped up

the previous slide when we were arguing that the output of Kruskal's algorithm is

connected. What did we argue?

We said, well, fix any cut you like. Any cut of the graph.

And freeze Kruskal's algorithm the very first time it considers some edge

crossing that cut. We noticed that Kruskal's algorithm is

guaranteed to include that first edge crossing the cut in its final solution.

There's no way that, that first edge considered can create a cycle with

respect to the edges already chosen. So, it's not going to trigger the

exclusion criterion in Kruskal's algorithm.

And that edge is going to get picked. So what's the significance about being

the first edge crossing a cut well because of the gradient criterian because

Kruskal considers edges from cheapest and most expensive.

The first edge it encounters crossing a cut is also necessarily the cheapest edge

that's crossing the cut. So here's how we weave these different

strands together to complete the correctness proof.

Alright, so let's remember where we are. We're focusing on a single iteration of

Kruskal's algorithm. It's about to add this edge U, V to the

edges, capital T, that it's chosen in the past.

We've exhibited a cut. A, B with a property that no previously

chosen edges, no edges of capital T cross this cut, and UV is going to be the first

chosen edge crossing this cut. We just argued, that Kruskal's algorithm

is guaranteed to pick the first edge crossing a cut.

So by virtue of their not yet being any chosen edges, crossing the cut AB, it

must be the case that this current edge UV is the first one that's ever been seen

by Kruskal's algorithm that crosses this cut AB.

Now, in being the first edge it's ever seen crossing this cut. It must also be

the cheapest edge of the input graph that crosses this cut.

That is the hypothesis of the cut property.

That is why this edge UV and this current arbitrary iteration is justified by the

cut property.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.