0:03

Next we're going to look at a slightly different graph processing application.

But it's a, a, a graph processing algorithm that's useful in many

applications we'll see in a minute. And slightly different than, depth and

breadth first search. It's called computing connected

components. Now I mentioned this a little bit when we

talked about basic definitions. So the idea is that if there's a path

between two vertices we say they're connected.

And what we want to do is reprocess the graph that is, build a data type that can

answer queries of the form, is V connected to W in constant time.

Now, we want to be able to do that for a huge, sparse graph of the type that

appears in practice. So we can't use the, if we could use the

adjacency matrix data structure, maybe we could do that but we can't.

So we're going to build a class that uses our standard representation, that will

enable clients to find connective components.

It's really interesting to think about this one.

We're getting the job done that we could get done if we get a huge, sparse matrix.

But if we have billions of vertices, there's no way we can have billions

squared in the matrix. So we have to find another way to do it.

So here's the data type that we want to implement.

So it's called and it's going to the constructor is going to build the data

structure that finds the connected components in the given graph to be able

to efficiently answer these connectivity queries.

It'll also be able to count the number of connective components.

And it also assigns an identifier from zero to count minus one, that identifies

the connective component that every vertex is in.

2:08

Now if you maybe, this sounds a little bit similar to what we did for the union sign

problem. So when you union find problem we're,

we're taking edges and we wanted to have queries that, that, well, well union is

like adding edge to the graph and then find is like are these things connected

yet. Now with union find, we found that we

couldn't quite answer the thing in constant time.

Are members are very slow growing on questions constant practical terms, but

not, not really. So it's less efficient than the algorithm

that we're going to talk about cause it doesn't quite get constant time.

On the other hand in another way it's better than the algorithm we're going to

talk about because you can either mix the unions and fines.

And in this case, because we're working with the graph it's like we're taking all

the unions and then we are handling fine requests.

So anyway what we're going to use to implement this is a depth first search

approach. So we'll do the depth first search, we'll

just keep different data then we did, than when we were findings paths.

So the algorithm is based on the notion that connection is an equivalence

relation. So recall an equivalence relation has

these three properties. Every vertex is connected to itself.

If v is connected to w then w is connected to v.

And if v is connected to w and w to x then v is connected to x.

And those basic operations underlie the algorithm that we're going to talk about.

So the equivalence relation is a, a general mathematical concept that implies,

in graph theory in this case. And as I already mentioned, in the case of

graph, it implies that. The vertices divide up into connected

components which are maximal sets of connected vertices.

So our sample graph has three connected components.

And what we'll do is assign identifiers to each one of the components in that will

for every vertex. Give us an identifier number for that

vertex. And that's the data structure then our

depth for search is going to built, and that immediately gives the constant time

implementation of the connectivity queries.

Given two vertices you look up their ID and if they're equal, they're in the same

connected component, if they're different they're not.

So that's the data structure that we're going to build it's like a unifying tree,

where there's just, all the trees are flat.

8:43

Okay so, now, we have to visit three. To visit three we have to check five and

four. Five was marked and nothing to do.

Four was marked, nothing to do. So we're done with three.

Done with three, we can continue the depth per search from five.

We have to check four and zero. Four was marked.

Zero was marked. So we're done.

And now we could continue with the depth per search of four.

We have to check six- three. Six was marked.

Three was marked and we're done. And we can,

Now six is done. And now we can continue with zero and we

have to check two. Check two it's not marked.

So we mark it and give it a connected component number of zero.

To visit two, all we do is check zero which is marked, so we're done.

And then we do the same thing with one, Unmarked.

So we visit it, I guess assign it zero. And then visit one.

And to do that, we check zero, which is marked.

So we're done with one. And then, to finish zero we have to check

five. And that's, marked so we don't have to do

anything. And we're done with zero.

So, now we're done with the first connected component.

But, we're not done with the whole graph. So what we want to do is, Go look for, so

that's a connected component, corresponding to zero.

Now we want, what we want to do is go look for an unmarked vertex.

So, we started at zero and we check one, two, three, four, five, and six.

And they're all marked. And so the next unmarked vertex that we

find in the graph is seven. So once we return from the depth first

search for zero, we Increment, counter. Which is our, how many connected

components have we seen. And now, we're going to assign that number

to everything that's connected to seven on the depth-first search on seven.

So what do we do to depth per research and sum we check eight.

8s on the mark. And so we go visit it.

So we assign it, connect the component of number of one same as seven.

And mark it. And then go ahead and recourse to visit

eight. We check seven which is marked.

So there is nothing to do. We're done with eight.

And then we're done with seven. So now we're done with all the vertices

that were connected to seven. We increment our counter of number of

components to two, and look for another vertex.

So now we check eight, which we already know is marked, connected to seven.

So now nine is unmarked so we're going to do a DFS from nine.

So everybody connected to nine is going to get assigned a connected component number

of two. So at nine, we have to check eleven and

that was unmarked. So we visit it.

Give it a two. To visit eleven, we have to check nine,

which is marked. So nothing to do.

And twelve, which is unmarked. So we visit it and, give it a number of

two. To visit twelve, we have to check eleven,

which is marked and nine which is also marked.

And then we're done with twelve. And then we're done with eleven.

And then, to finish doing nine, we have to check ten and twelve.

Ten is unmarked. So we mark it and give it a number of two.

To visit ten, we check nine which is marked, so we're done.

And then finally, to finish, the DFS. We check, twelve from nine, and that's

marked. So we're done with nine.

And now we keep looking. And we find that, ten, eleven, twelve, are

all marked. So we've completed the.

Computation. And for every vertex we have a connected

component number. And for any given query we can test

whether their in the same connected component simply by looking up that number

and seeing if it's equal. That's a demo of connected components

computation. Okay, so here's the code for finding

connected components with DFS. Which is, another straightforward DFS

implementation. Just like the other one.

And it just keeps, slightly, different, data structure.

So, The We keep the marked data structure, which is the vertices that we visited.

And then we keep this vertex index array ID which gives the identifier, the

component containing V, I think we call it on the demo.

And then a count of the number of components that we've seen.

So the constructor creates, the, marked array and it creates this idea array.

But now the constructor does more work than a single call on DFS.

What it does is it goes through, this is the constructor.

Goes through every vertex in the array in the graph.

And if it's not marked, it does the DFS. And that DFS will mark a lot of other

vertices. But when it's done that's all of those are

going to get assigned a value of count. And we're going to increment count, then

go and look for another unmarked vertex. Anything that wasn't marked by that first

DFX, DFS, we'll do a DFS from that one, and mark all its vertices with the next

value for the ID. So now let's look at the implementation of

DFS. It's recursive array just like the one

that we did. For, for path finding.

Except all that we do, when we mark a vertex.

We also simply set its ID to the current component name.

So all the vertices that are discovered in the same call of DFS have the same ID.

And to visit a vertex you go through all its adjacent vertices.

And any that are not marked you give a recursive DFS call.

Again, this code is amazingly compact and elegant.

When we're going through the demo step by step maybe you, you can see that the

underlying computation is actually kind of complex.

But, but, recursion and the graph-processing API that we set up

provides a compact and easy to understand implementation.

So that's using DFS to find connector components and then to return the idea of

a given vertex, just look at up in the array and to return then our components

just you can count and then you can build up the needed connectivity API from those.

So that's depth first search to find connected components.

I will just talk briefly about two applications from scientific applications.

So here's an application of sexually transmitted diseases at a high school.

And, simply the vertices are people, blue are men and pink are women.

And you I have an edge between if there was a contact.

And so it's obvious that you're going to be interested in connective components of

this graph. To be able to properly study sexually

transmitted diseases. These individuals had no contact with

these. Then, the, whichever one has the disease,

maybe it won't spread. Or if you add a new edge, then maybe you

have a problem. That's just one example of studying the

spread of disease. Here's another example that we use, for,

for that's similar to the flood fill example.

And this is processing data from a scientific experiment.

And in, in this case this image is comes from a photograph.

And the white things are particles that are moving.

And all we get is a image where it's a grey scale image.

And so, what we'll do to do this processing is, we want to identify the

movement of these particles over time. And the way we do it is build a grid

graph, like the one for the flood fill application and do an edge connecting two

vertices. If they're different,

He said their gray scale values is greater than less than some threshold.

And so then if you do that, and then find the connective components, then you can

identify blobs which correspond to real particles in this simulation.

And they do that every frame in a movie, then you can track moving particles over

time. So these are maybe fairly high resolution

images. These are graphs with lots and lots of

edges, and you need to be able to do this computation quickly in order to do this

scientific experiment. And we'd use this as an example.

Example in our first year programming course it is based on computing connected

components using depth-first search. So that's our third example of a graph

processing algorithm.