The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

來自 Stanford University 的課程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

557 個評分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

從本節課中

Week 1

Two motivating applications; selected review; introduction to greedy algorithms; a scheduling application; Prim's MST algorithm.

- Tim RoughgardenProfessor

Computer Science

Okay. So in this video we're going to begin our discussion about why Prim's

algorithm is correct. Why always, for every connected graph

outputs a minimum spanning tree of that graph.

For this video, we're going to content ourselves with a much more modest school.

We're only going to prove for now the Prim's algorithm outputs a spanning tree.

We're not going to make any claims yet about optimality.

Even just this fact is not trivial and proving it will give us a good

opportunity to get our hands dirty with some basic properties of graphs and

specifically graph cuts. Graduates of part 1 of this online class

of course are already familiar with graph cuts. We studied them at length via

Karger's randomized algorithm for computing the minimum cut of a graph.

So, the concept is the same here, let me state it again to jog your memory.

So a cut of a graph is simply a partition of its vertex set, two groups, and each

of those two groups should be non-empty. So pictorially, we envision some of the

vertices of G, this blob A being in one group,

and the rest of the vertices, this graph B being in a different group.

Now, what's up with the edges? How can they be distributed in this

picture? Well, the two endpoints of an edge,

there's three cases, either both of the endpoints can be in

the set A. So there's various edges internal to A.

Similarly, an edge might have both of its endpoints inside of B.

But we're going to be most interested in the third case,

edges that have one point exactly in each of A and B.

So these are edges that we say cross the cut, A, B.

So hopefully the definition of a cut seems simple enough, but cuts in

particular their relationship to edges can be quite interesting, quite useful.

So as shown here in the picture, of course for a given cut, there can be many

edges crossing it. by the same token for a given edge of a

graph, in general, there will be many cuts of the grap,

that's, that edge crosses. So, to understand this a little bit

better, let's just review a simple property that cuts through the graph.

Let me just ask you just how many there are.

Specifically, for a graph that has n vertices, roughly how many cuts does it

have? Roughly n, roughly n squared, roughly

2^n, or roughly n^n? Now, none of these four answers is

exactly right, but one of the four is a lot closer to the exact expression than

the other three and I'm asking you, which of them is it?

Alright. So the correct answer is the third one,

2^n. A graph of n vertices has essentially 2^n

cut, so there's an exponential number of cuts there's a lot of them.

So why is this true? Well, in effect you can imagine making a

binary decision for each of the n vertices.

They either go into A. What were they going to be?

So n binary decisions results in 2^n different outcomes.

Now why is this slightly incorrect? Well, in fact, a cut has to have two

non-empty sets. A is not allowed to be empty,

B is not allowed to be empty, so that rules out two of the

possibilities. So actually, strictly speaking, it's 2^n

- 1 different cuts of a graph. So what we're going to do next is we're

going to state and prove three easy facts about cuts in graphs.

Once we have these three easy facts, we will be able to prove the claim at the

beginning of this video, namely the Prim's Algorithm always

outputs a spanning tree. The first of these three properties about

cuts, I'm going to call the empty cuts lemma.

The point of the empty cut lemma is to give us a characterization that is a new

way of saying when a graph is connected. So in particular, I'm going to phrase in

terms of a graph not connected. And the claim is that a graph is not

connected if and only if we can find a cut of the graph that has no edges

crossing it. So remember how we defined a graph being

connected, that means for any two vertices in the

graph we can find a path in the graph from one vertex to the other.

So what we're saying is that being not connected,

that is, there existing a pair of vertices with no path between them is

equivalent to there being a cut with no crossing edges.

So let's go ahead and prove this real quick.

So as an if and only if statement, really this proof, we have to do in two

parts. First, we have to prove that assuming the

first statement, we can derive the second.

Then we have to show that assuming the second statement, we can derive the

first. I think the easier direction is to assume

the right-hand side and then derive the left-hand side.

So let's start with that one. That is, consider a graph G so that

there's a cut, A, B with no edges of G crossing this cut.

The plan is to exhibit a pair of vertices that do not have a path between them,

there, thereby certifying that the graph is not connected.

So, it's pretty easy to figure out which pair of vertices we should look at,

just take one vertex from each side of the cut which has no crossing edges.

So why is it that there's no path from U to V in the graph G?

Well the path from U to V would surely have to cross the cuts, A, B, but there's

no edges available for crossing the cut. So therefore, this path from U to V

cannot exist. So that completes the first part of the

proof. We assume the right-hand side, we derive the left-hand side,

now we start all over again, but we assume the left-hand side and we have to

prove the right-hand side. So by virtue of, by the assumption that

the graph is not connected, there has to exist a pair of verticies U and V that

have no path between them. We are now responsible for exhibiting

some cut A, B such that no edges of the graph G crossing.

So where are we going to get these sets capital A and capital B from?

Well, here is the trick, which is going to make the proof go really nicely.

We define the set of verticies of capital A to be those reachable from U in the

graph G. Another way to think about this is that

capital A is simply used connected components in the sense that we discussed

in part 1 of the course. Now because we want to cut and a cut is

our partition, we better well put in the group, capital B, all of the verticies

that are not in A. If you like, this is all of the connected

components other than the one that contains U.

Note that by definition, U is in capital A,

certainly U is reachable from itself. And by assumption, V and U are not

reachable from each other, so V is going to be in capital B.

So neither of these sets is non-empty. This is indeed a bonafide cut of the

graph G. All that remains is to notice that there

are no crossing edges across this cut. And why is that true?

Well, if there was an edge crossing the cut A, B with one endpoint in A, one

endpoint in B. Well, by definition, there are paths from U to everything else in A,

so if there is any edge sticking out of A, that would give us a path to some

vertex in B. But, B definition of vertices not

reachable from capital A, so that's a contradiction.

So again, the point is that if there were edges crossing this cut, then we can

expand A and make it even bigger. So therefore, there aren't any edges

crossing the cut. The cut is empty, that's what we needed

to prove. Assuming the graph was disconnected, we

have exhibited a cut, A, B with no crossing edges.

So that wraps up of the first of our three facts, and in fact, the most

difficult of our three facts about cuts in graphs.

And again,, what did the empty cut lemma say?

It gives us a new way of talking about whether or not a graph is connected.

So it's disconnected if and only if there's an empty cut.

It's connected if and only if there are no empty cuts.

So that's the keypoint from this slide. Let's now knock off the other two facts

we're going to need. The first one I'm going to call the

double crossing lemma. In essence, what the double crossing

lemma says, is that, if a cycle in a graph crosses a cut, then it has to cross

it twice, it cannot cross it only once.

So pictorially, we look at a cut of a graph, so there's the two vertex groups A

and B. By hypothesis, there's some edge E with

one endpoint in each side, and by assumption, this E, this edge E,

participates in some cycle that we're calling capital C.

And if you look at the picture, you realize that the claim in this lemma is

obvious, that, because the cycle has to loop back

on itself, if it has an edge with one endpoint on either side, there has to be

a path connecting the two dots, connecting those two endpoints back to

each other and that path has to cross back for, over this cut A,

B. Indeed, the double crossing lemma is a

special case of a stronger statement which is equally easier to see, which is

that if you take any cut of a graph and you take any cycle you know, it starts

and ends at the same point, then it has to cross this cut an even

number of times. It might cross it 0 times, but it's not

going to cross it once. It could cross it twice.

It could cross it four times, if it crisscrosses back and forth.

It could cross it six times, and so on. But if it crosses it strictly more than 0

times, then it has to cross it at least twice.

That's the point of the double crossing lemma.

So, we'll use this in its own rights later on.

But I'm also, for the moment, interested in easy corollary of the double crossing

lemma. I will call this the lonely cut

corollary. Let me tell you the point of the lonely

cut corollary. In general, in these spanning tree

algorithms, to ensure that we output a spanning tree,

then we have to, in particular, make sure we don't create any cycles.

The point of this corollary is it's a tool to argue that we don't create

cycles. So how can we be sure that an edge

doesn't create cycles? Well, here is a way.

Suppose there's a cut, so we're looking at an edge E, suppose we can identify a

cut A, B so that edge E is the only cut crossing it, it's the lonely edge

crossing this cut. Well then, by the double crossing lemma,

there is no way this thing is in any cycle.

If it were in a cycle and a cross to cut, that cycle would have to cross it again

and it's edge wouldn't be lonely, it would have company.

So if you're lonely on a cut, it mean's you cannot be in a cycle.

So now we've got all of our ducks lined up in a row and we're ready to prove the

first part of the correctness of Prim. That is, we're ready to argue that Prim's

algorithm, given a connected graph, outputs a spanning tree.

Again, for the moment, we're making no claims about optimality, that will be in

the next video. So we're going to make this argument in

three steps. And for the first step, you might want to

go look again at the pseudocode of Prim's algorithm just to remember what the

notation was. The first step, we're just going to

notice that the semantics of the algorithm are respected.

So the algorithm maintains two different sets throughout its evolution.

On the one hand it maintains a set capital x, intended to be the vertices

spanned so far. The other hand, it maintains a set of

edges, capital T, the edges that have been picked so far.

And the intent was that the current edges capital T always spans the current vertex

at capital x. So the first thing is just to verify that

that is in fact true. This I'm not going to prove formally.

In my experience, students find this kind of obvious and the intuition is correct.

if you want a rigorous proof, go go ahead and fill in the details yourself.

It's a straightforward induction with no nasty surprises.

[SOUND] Now, we're trying to argue the output of this algorithm is a spanning

tree. So let's recall what that means.

What is it that we have to check? So there's two properties.

First of all, there can't be any cycles, there can't be any loops.

Second of all, it has to span all of the vertices.

It has to be a path inside the tree edges from any vertex to any other vertex.

So let's go ahead and prove both those things in reverse order.

So, the second step of the proof is going to be to argue that the algorithm outputs

something which does span all of the vertices.

So at the end of the day, we'll have a path from any vertex to any other vertex

using only the edges in our chosen set, capital T. Now, by part one of this

proof, all we need to prove is that the algorithm halts with capital X equal to

capital V, then we know that capital T spans everything in V.

So how could that not happen? How could Prim's algorithm somehow halts

with this spanned vertices capital X, not being all of capital B,? We'll go back

and check out the pseudocode and look at the main wild loop.

So every wild loop, every iteration, we add one new vertex to capital X. What

could go wrong? The only thing that could go wrong would be is if some iteration,

before we're spanning everything, when we scan the frontier around capital X, there

aren't any edges. That's the only way we can fail to

increase the vertices in capital X in a given duration.

But what would that mean? What would it mean if in some iteration

we couldn't find edges with one endpoint in capital X and the other endpoint in V

- X? Well then we would have exhibited an

empty cut. The cut X, V - X would have no crossing

edges. And now we can use the empty cut lemma,

which says if there's an empty cut, then the graph is disconnected.

But by assumption, we're working with a connected input graph, so that can't

happen. Okay? So the algorithm never gets stuck,

we always increase capital X by one vertex because the original graph was

connected, that means that halt was something spanning all of the verticies.

For the final step, we need to argue that Prim's algorithm never creates any cycles

in the edges that it, it's choosing capital T.

So, why are there no cycles? Well, what we're going to do is we're

going to talk about each edge in turn, the Prim's algorithm adds,

and argue that whenever a new edge gets added, there's no way that edge creates

any cycles in the set capital T. And, to see why, take a snapshot of the

algorithm of some given iteration, to the sum current set capital T, and

there's some set verticies capital X that the edges in T span.

V - X to the verticies not yet spanned by T and of course we can think of X, V - X

as a cut of the graph. And at this moment in time, at this

snapshot, the edges of capital T, they're all of one type.

They all have both of their endpoints inside capital X, none of them have any

endpoints inside V - X. So in particular, none of the edges

chosen thus far cross the cut X, V - X. That's by construction, they only span

the verticies of X. Now what type of edge is going to get

added in this iteration. Well, Prim's algorithm searches only over

edges that have one endpoint inside X and one endpoint outside.

That is, it searches only over edges that cross the cut X, V - X.

So the edge that gets added in this iteration is going to be a trailblazer

for this cut. None of the edges yet shows and cross the

cut, but the edge showed in this iteration

will definitely, cross the cut. So the moment edge E gets added to the

tree capital T, it is going to be lonely across the cut V sorry, X, V - X.

So by the lonely cut corollary as the sole member crossing this cut in capital

T, it cannot possibly participate in any cycles.

Remember, if it participated in a cycle in capital T, that cycle would have to

cross this cut somewhere else. But there aren't any other edges crossing

this cut, this is the only one. So that's why when we add a new edge,

there's no way it can create any cycles. It's the sole member crossing this

particular cut.