0:36

So that's just a way to encode an ordering, and

then here's really the important property that

every directed edge of G goes forward in the ordering.

That is if u v is directed edge of the directed graph G,

then it should be that the f value of the tail is less than the f value of the head.

That is this directed edge has the higher f value as traverse

in the correct direction.

Let me give you an example just to make this more clear.

So suppose we have this very simple directed graph, with four vertices.

Let me show you two different,

totally legitimate topological orderings of this graph.

So the first thing you could do,

is you could label s1, v2, w3 and t4.

Another option would be to label them the same way,

except you can swap the labels of v and w.

So if you want, you can label v 3 and w 2.

So again,

what these labelings are really meant to encode is an ordering of the vertices.

So the blue labeling, you can think of as encoding

the ordering in which we put s first then v then w and then t.

Whereas the green labeling can be thought of as the same ordering of

the nodes except with w coming before v.

What's important is that the pattern of the edge is exactly the same in both

cases, and in particular all of the edges go forward in this ordering.

So in either case we have s with edges from s to v, and s to w.

2:11

So that looks the same way pictorially, whichever order v or w are in, and

then symmetrically there are edges from v and w to t.

So you'll notice that no matter which order that we put v and w in,

all four of these edges go forward in each of these orderings.

Now if you try to put v before s it wouldn't work because the edge

from s to v would be going backward if v preceded s.

Similarly, if you put t anywhere other than the final position,

you would not have a topological ordering.

So in fact,

these are the only two topological orderings of this directed graph.

I encourage you to convince yourself of that.

Now, who cares about topological orderings?

Well, this is actually a very useful subroutine.

This has been come up in all kinds of applications.

Really, whenever you want to sequence a bunch of tasks when there's precedent

constraints among them.

By precedence constraint I mean one task has to be finished before another.

You can think, for example, about the courses in some kind of undergraduate

major like a computer science major.

Here the vertices are going to correspond to all of the course and

there's a directed edge from course A to course B if course A is a prerequisite for

course B, if you have to take it first.

So then of course, you'd like to know a sequence in which you can take these

courses so that you always take a course after you've taken its pre-requisite.

And that's exactly what a topological ordering will accomplish.

So it's reasonable to ask the question when does a directed graph

have a topological ordering and when a graph does have such an ordering,

how do we get our grubby little hands on it?

Well there's a very clear necessary condition for

a graph to have a topological ordering, which is it had better be a cyclic.

Put differently, if a directed graph has a directed cycle

then there is certainly no way there is going to be a topological ordering.

6:24

By assumption there's no sink vertex, so this isn't a sink vertex, so

there's an outgoing arc, so let's follow it.

We get to some other node.

That also has an outgoing arc, let's follow that, and so on.

So we just keep following outgoing arcs, and we do this as long as we want

because every vertex has at least one outgoing arc.

Well there's a finite number of vertices, right this graph has say N vertices.

So if we follow N arcs, we are going to see N+1 vertices.

So by the pigeon-hole principle, we're going to have to see a repeat.

Right so, if N+1 vertices, is only N distinct vertices,

we're going to see some vertex twice.

So for example, maybe after I take the outgoing arc from this vertex,

I get back to this one that I saw previously.

Well, what have we done?

What happens when we get a repeated vertex?

By tracing these outgoing arcs and repeating a vertex,

we have exhibited a directed cycle.

And that's exactly what we're assuming doesn't exist.

We're talking about directed acyclic graphs.

So, put differently,

we just prove that a vertex with no sink vertex has to have a directed cycle.

So a directed acyclic graph therefore has to have at least one sink vertex.

8:39

So let v be a sink vertex of g,

if there's many sink vertices, we pick one arbitrarily.

We set v's label to be the maximum possible, so

if there's n vertices we're going to put that in the nth position.

And now we just recurse on the rest of the graph, which has only n-1 vertices.

So how would this work in the example on the right?

Well in the first iteration, or the first outermost recursive call,

the only sink vertex is this right most one, circled in green.

So there's four vertices, so we're going to give that the label 4.

So, then having labeled that 4,

we delete that vertex and all the edges incident to it.

And we recurse on what's left of the graph, so

that would be the left-most three vertices plus the left-most two edges.

Now, this graph has two sink vertices, after we've deleted 4 and

everything from it.

So both this top vertex and this bottom vertex are sinks in the residual graph.

So now in the next recursive call, we can choose either of those as our sink vertex.

Because we have two choices, that generates two topological orderings.

Those are exactly the ones that we saw in the example.

But if, for example,

we choose this one to be our sink vertex, then that gets the label 3.

Then we recurse just on the northwestern most two edges.

This vertex is the unique sink in that graph, that gets the label 2.

And then it recurs on the one node that we graph, and that gets the label 1.

So, why is this algorithm work?

Well, there's just two quick observations we need.

So, first of all, we need to argue that it makes sense that in every iteration or

in every recursive call, we can indeed find the sink vertex,

that we can assign in the final position that's still unfilled.

And the reason for that is just if you take a directed acyclic graph and

you delete one or more vertices from it,

you're still going to have a directed acyclic graph, right?

You can't create cycles by just getting rid of stuff.

You can only destroy cycles, and we started with no cycles.

So through all the intermediate recursive calls we have no cycles by

our first observation is always the sink.

13:32

And if you find any vertex that you haven't seen before,

you immediately start recursing on that node.

So I said the first modification we need is to embed this into an outer for

loop to ensure that every single node gets labeled.

So I'm going to call that subroutine DFS-Loop.

It does not take a start vertex.

Initialization, all nodes start out on an explorative course.

And we're also going to keep track of a global variable,

which I'll call current_label.

This is going to be initialized to n,

and we're going to count down each time we finish exploring a new node.

And these will be precisely the f values.

These will be exactly the positions of the vertices

in the topological ordering that we output.

In the main loop we're going to iterate over all of the nodes of the graph.

So for example, we just do a scan through the node array.

As usual, we don't want to do any work twice, so if a vertex has already

been explored in some previous invocation of DFS, we don't search from it.

This should all be familiar from our embedding of breadth first search in a for

loop when we computed the connected components of an undirected graph.

And if we get to a vertex v of the graph, that we haven't explored yet,

then we just invoke DFS in the graph, with that vertex as the starting point.

So, the final thing I need to add is I need to tell you what the f values are,

what the actual assignments of vertices to positions are.

And as I foreshadowed, we're going to use this global current_label variable.

And that'll have us assign vertices to positions from right to the left.

Very much mimicking what was going on in our recursive solution,

where we plucked off sink vertices one at a time.

So, when's the right time to assign a vertex its position?

Well, it turns out,

the right time is when we've completely finished with that vertex.

So we're about to pop the recursive call from the stack

corresponding to that vertex.

15:30

And that's it, that is the entire algorithm.

So, the claim is going to be that the f values produced,

which you'll notice are going to be the integers between n through 1,

because DFS will be called eventually once on every vertex, and

it will get some integer assignment at the end.

And everybody is going to get a distinct value, and the largest one is n and

the smallest one is 1.

The claim is that is a topological ordering.

Clearly this algorithm is just as blazingly fast as DFS itself,

with just a trivial amount of extra bookkeeping.

Lets see how it works on our running example.

So lets just say we have this four node directed graph that we're getting

quite used to.

So this has four vertices, so

we initialize the current label variable to be equal to 4.

So let's say that in the outer DFS loop,

let's say we start somewhere like the vertex v.

So notice in this outer for

loop, we wind up considering the vertices in a totally arbitrary order.

So let's say we first call DFS from this vertex v.

So what happens, well, the only place you can go from v is to t, and then at t,

there's nowhere to go.

So we recursively call DFS at t, there's no edges to go through, we finish the for

loop, and so t is going to be assigned an f value equal to the current label,

which is n, and here, n is the number of vertices, which is 4.

So f(t) is going to get, so our t is going to get the assignment, the label 4.

So then now we're done with t, we backtrack back to v.

We decrement the current label as we finish up with t, we get to v, and

now there's no more outgoing arcs to explore, so for loops finish.

So we're done with it in depth-first search.

So it gets what's the new current label, which is now 3, and again,

having finished with v, we decrement the current label, which is now down to 2.

So now we go back to the outer for loop,

maybe the next vertex we consider is the vertex t.

But we've already been there, so we don't bother to DFS on t.

And then maybe after that, we try it on s.

So maybe s is the third vertex that the for loop considers.

We haven't seen s yet, so we invoke DFS, starting from the vertex s.

From s, there's two arcs to explore, the one with v, v we've already seen, so

nothing's going to happen with the arc sv.

But on the other hand, the arc sW will cause us to recursively call DFS on w.

From w, we try to look at the arc from w to t, but we've already been to t, so

we don't do anything.

That finishes up with w, so depth-first search then finishes up at the vertex w,

w gets the assignment of the current label.

So f(w) = 2.

We decrement current label, now its value is 1.

Now we backtrack to s, so we've already considered all of s's outgoing arcs, so

we're done with s.

It gets the current label, which is 1.

And this is indeed one of the two topological orderings of this graph

that we exhibited a couple slides ago.

So that's the full description of the algorithm and

how it works in a concrete example.

Let's just discuss what are its key properties, its running time and

its correctness.

So, as far as the running time of this algorithm the running time is linear.

It's exactly what you'd want it to be.

And the reason the running time is linear is for

the usual reasons that these search algorithms run in linear time.

You're explicitly keeping track of which nodes you've been to so

that you don't visit them twice, so you only do a constant amount of work for

each of the n nodes.

And each edge, in a directed graph, you actually only look at each edge once,

when you visit the tail of that edge.

So you only do a constant amount of work per edge as well.

Of course, the other key property is correctness.

That is, we need to show that you are guaranteed to get a topological ordering.

So what does that mean?

That means every arc travels forward in the ordering.

So if (u,v) is an edge, then f(u), the label assigned

to u in this algorithm is less than the label assigned to v.

The proof of correctness splits into two cases,

depending on which of the vertices u or v is visited first by depth-first search.

Because of our for loop, which iterates over all of the vertices of the graph g,

depth-first search is going to be invoked exactly once from each of the vertices.

Either u or v could be first, both are possible.

So first let's assume that u was visited by DFS before v, so then what happens?

Well, remember what depth first search does, when you invoke it from a node,

it's going to find everything findable from that node.

So if u is visited before v, that means v isn't getting explored, so

it's a candidate for being discovered.

Moreover, there's a an arc straight from u to v, so

certainly DFS invoked at u is going to discover v.

Furthermore, the recursive call corresponding to the node v is going to

finish, it's going to get popped off the program stack before that of u.

The easiest way to see this is just to think about the recursive structure of

depth-first search.

So when you call depth-first search from u, that recursive call, that's going to

make further recursive calls to all of the relevant neighbors including v, and

u's call is not going to get popped off the stack until v's does beforehand.

That's because of the last in, first out nature of a stack or

of a recursive algorithm.