0:00

We're now ready to present all the details of the Kruskal Algorithm.

Namely, we will prove that the Kruskal strategy is optimal,

it produces a spanning tree of minimum total weight,

and we will also present implementation details of this algorithm.

Recall that the idea of this algorithm is the following.

We start with an empty set X, and we repeatedly add to this set

the next lightest edge that doesn't produce a cycle.

So it is not difficult to see that

at any point of time the set of edges X forms a forest.

That is a collection of trees.

Let me illustrate this.

Assume that we have some set of vertices and initially, the set X is empty,

which means that each of our vertices forms a tree of size 1,

namely a tree that contains 1 vertex and no edges.

Initially each vertex is isolated in our set X.

Now, we start adding edges.

Probably, this is the first one, then we add this edge, then this edge,

then this edge, then this edge, for example.

At this point of time, our set X consists of three trees.

So this is the first tree, T1.

This is the second tree, T2.

And this is the third tree, T3.

In particular, the tree T3 contains just one vertex.

It is an isolated vertex, okay?

2:38

As one part of this cut namely this is going to be the set S so

this is one part of our partition and

all other vertices is the other part of this partition.

In this case, we see that e is the lightest edge

that joins two vertices from different parts.

Which means in turn that cut property justifies that adding

e in this case is indeed a safe move.

In other words, if our carbon set tax is a part

of some optimal spanning tree, then x with

e added Is also a part of some minimum spanning tree.

3:25

Once again, initially, the set X in the Kruskal algorithm is empty,

which means that each vertex of our initial graph

forms a separate connected component.

So this is how initially the set x looks like.

So each vertex lies in a separate connective component.

Then we start adding edges to x.

This creates [NOISE] a forest.

In this forest currently, we have three trees.

This is the first tree.

This is the second one.

And this is the third one.

Assume now that the next lightest edge that Kruskal's Algorithm

considers is the following one.

4:17

First of all, we need to be able to check whether it joins two vertices that lie

in the same tree or in other words, that lie in the same connected component.

In this case, they lie in the same connected component, so

Kruskal's Algorithm will not edit through the set x,

because otherwise, it would produce a cycle in our set x.

Now, assume that next set that Kruskal's Algorithm tries is the following.

Again, we need to check whether the corresponding two end points lie in

the same connected component.

In this case, it is not a case.

They lie in different connected component.

So we add this edge and to this point, we should update the data structures that

we use to indicate that now we actually merge trees T1 and T2.

So what we need to check in our data structure is whether two given vertices

lie in the same set or in the same connected component, and also if they lie

in different connected components, we need to merge the corresponding two trees.

So the perfect choice for data structure for this algorithm is, of course,

the disjoint sets data structure.

Once again, to check whether two given vertices lie

in different connected components, we just check whether find of

one endpoint is equal to find of the other end point of this edge.

If they are different then they lie in different connected component.

And when adding an edge to the set X, we need to merge the corresponding two tree

and this is done by calling the method union of the corresponding two end points.

We will now illustrate this on a toy example.

This is a toy example where we have six vertices.

Let's first call them A, B, C, D, E, F, and

let's assume that we have a data structure disjoint set and

let me show the contents of this disjoint sets of this data structure.

So initially, each vertex lies in a separate set.

No we start processing edges in order of non-decreasing weight.

So the first lightest edge is AE.

We check whether A and E, at this point,

lie a different connected components.

For this way, we call find of A and find of E.

This gives us different IDs because they stay in different sets.

So we add this edge to our current solution and

we also notify our data structure that now A and

E actually lie in the same connected component.

So now it looks like this.

The next lightest edge if the edge CF.

Again we ask our data structure whether C and F belong to the same set and

each replies that they do not belong to the same set because find of C is

not equal to find of F, so it is safe to add this edge to our solution.

We also notify our data structures and C and

F now lie in the same set by calling union of C and F.

So now C and F lie in the same set.

7:56

The next edge is A, E, D and we see that A and

D lie in different connected components so we just add this etch to a solution and

also notify our data structures that we need to merge sets

that contain the vertex A and the vertex D.

So now, we have three different disjoint sets in our data structure,

which actually corresponds to vertices of our three trees.

So the first tree contains vertices AED,

the second one contains the vertex B, and the last one contains vertices C and F.

Now, the next lightest edge is DE, it has weight 3.

However, we see that D and E belong to the same connected component.

This, in turn, means that if we added the edge DE to our current solutions,

this would produce a cycle.

So we just keep the edge DE, and we continue to the next lightest edge.

The next one is AB, and we see that A and B lie in different connected components,

so it is safe to add the edge AB to the current solution.

We also need to merge the corresponding two sets.

9:44

It is of weight 8 and, at this point, we also nudge two sets.

And now, all our vertices lie in the same connected component,

which means that we constructed an optimal spanning tree,

that is a spanning tree of minimum total weight.

The pseudocode of the Kruskal algorithm looks as follows.

First, for each vertex in our graph, we create a separate disjoint set.

We do this by calling MakeSet method of disjoint sets data structure.

Then we initialize the set of edges X by empty set.

The next step is that we sort the edges, all the edges of our graph, by weight.

Then we process all the edges in order of non-decreasing weight.

This is done in this is fold.

Then for reach such edge, we need to check whether adding in the x safe or not.

Namely, whether it produces a cycle or not.

To do this, we just check whether u and v belong to different connector components.

For this, we need to check where to find a few equal to find a v or not.

If they're different, then they lie in different connected components.

In this case, it is safe to add the edge u,

v to the set X and produces in this line and

also in this case we need to notify our data structure that all

the vertices that before that lied in connected component with u and three,

now lie in the same connected components, because we just joined these two trees,

and this is done by calling union of of u and tree.

Finally, in the end, we just return the resulting set X.

It remains to estimate the running time of the Kruskal's algorithm.

We start by sorting the edges, so this requires big O(E log E) time, right?

This in turn can be rewritten as big O(E log of V squared),

just because a simple graph has at most V squared edges.

This, in turn, can be rewritten as just E times 2 log V.

12:42

Then we also make at most V minus one calls to the union procedure.

Why V minus one?

Well, just because initially we have n connected components.

Namely, when the set x is empty,

each vertex of our graph forms a separate connected components.

Then each time when we call union, we actually merge two connected components.

And in the end of the run of our algorithm,

we have just one connected component.

So all the vertices lie in the same tree.

So initially, we have n connected components and then we have 1 and

each time we call union, we reduce the number of connected components by 1 which

means that we make exactly V minus 1 calls to union procedure.

Okay, so we have roughly E calls to find and roughly V calls to union procedure.

Now recall that if we implement the disjoint set data structure as a forest or

as a collection of disjoint trees and we use union by rent.

Heuristic than the running time that abound on the running time of each

iteration is just log v, which gives us that amount E plus V times log v.

13:57

Recall also that in our case,

the graph is connected, which mean that e is at least v minus 1,

which in turn means that E plus V, is at most 2E.

Which allows us to rewrite it as follows.

So the upper bound on the running time of the second

step is actually the same as for the first step.

It is O of E log V, which gives us that the upper bound

on the running time of the whole algorithm is big O(E log V).

Now recall that we actually know that if, for our implementation

of disjoint sets data structure, we use both union by run heuristic and

past compression heuristic then we can state a strong upper bound.

That is, instead of using log v here,

we actually have log star of v, the iterated log, right?

This gives us a stronger upper bound for the second step.

Unfortunately, this doesn't improve the total

running time because still the upper bound for

sorting all the edges delineates the upper bound for the second step.

However, for some applications, the edges are given to us already in sorted order,

which means that, in this case,

we do not need to spend E log V time for sorting the edges.

So that the total running time in this case becomes equal to E times log* of V.

Which makes the Kruskal algorithm in this case even more efficient.

So in this case, the running time is upper bounded

by E times log star of V, which is nearly linear.