0:04

Okay. First, we're gonna look at the search algorithm for, digraphs and this is

the finding paths, what are all the vertices that we can get to from a given

vertex along a directed path. and again, this is, little more complex for a digraph

it would seem, than for a graph. so in this case, those are the, that's the set

of vertices that you can get to, from the given vertex x, s. Notice that, this set

is characterized by every edge crossing the boundary, it goes in. if there were an

edge that went out, that would give, another member of the set.. well actually,

looks more complicated to a human, but to the computer, it looks exactly, precisely

the same. in fact. the method that we looked at for undirected graphs is

actually a digraph processing algorithm. it treats, every connection between two

vertices as two directed edges, one in each direction. So, DFS, that we looked at

last time is actually a digraph algorithm. And we used precisely the same code. So to

visit a vertex V, we mark the vertex as visited, and recursively visit all

unmarked vertices W, that you can get to from V. let's look at the demo, just to, .

Reinforce that. So, here's a sample digraph with the edges over at the right.

let's look at that first search on that digraph. so, we're going to look at the

vertices that we can get to from vertex zero in this digraph. Again, we have two

vertex index to raise. one called marked, which says whether we can get there from

V, and the other called edge two, which gives us the vertex that took us there.

with that we can recover the paths from vertex zero to each vertex that can be

reached from vertex zero. So we start off by visiting vertex zero and now check the

edges that are adjacent to it with directed edges going out. so there's five

and then there's gonna be one. but five is unmarked so we have to re-cursedly visit

five. So we mark five, and we say we got there from zero. so the path from, to five

is zero to five. and so now we're gonna recursively visit all the unmarked

vertices pointed to from five . In this case it's just four. My four is

unmarked so we're gonna recursively visit four and say we got there from five. and

now recursively we have to check all the unmarked vertices pointing from four.

there's three. and two, first we do three and that's unmarked. so we gotta visit

three. and say that we got there from four. and now to visit three. We've looked

at all the vertices pointing from three. we can check five. We've already been to

five that's marked so we don't have to do anything. and then we check two. two is

unmarked so we continue with the depth first search and visit two. so now to

visit we mark two, and say we've got there from three. and now we check the vertices

that we can get two from two. In this case it's zero, which we've already been to.

And three, which we've already been to. so. now we're done with vertex two. and we

can return and continue the search from three well actually that was the last one

from three, so we're done with three as well. so now we're at four. We still have

checked the edge from four to two. so now we do that. And of course we've been to

two, so we don't have any further processing. and we're done with four. the,

, four was the only edge we get to from five. So we're going to be done with five

as well. and then, what about zero, well we have to check one. 1's not visited, so

we visit one, mark it. and we turn and then we're done with zero, and that gives,

the set of all vertices that are reachable from zero, and not only that the edge to

array. Gives the information that we need to reconstruct the path from any of those,

from zero to any of those vertices using precisely the same method that we used

before. We get the four from five, we get the five from zero, so zero, five, four is

the path to four. And we can do that for any vertex in that cell. Okay. So what

about the code? the code is exactly the same as for , undirected graphs. That's

the code for undirected graphs. that we looked at last time to get the code for

digraphs, we just changed the name, its the s ame code, otherwise. the recursive,

the constructor, builds the array of marked vertices, and also builds edge too,

just to, avoid clutter, left that one off this slide, and then it makes the call to

DFS. then the recursive DFS does the work. It marks the vertex and for every adjacent

vertex. If its not marked, it does the DFS. and then the client can ask whether

any ver-, any given vertex is reachable from s after the constructor has done its

work. that's depth-first search in directed graphs, actually, we already did

it. now here's just a couple of applications where this kind of code is

used. one is, a so called program control flow analysis. actually every program can

be viewed as a digraph. where, the vertices are basic blocks of instructions

that are just executed one after the other with no conditionals. and then edges

represent, a, jump. If there's an if statement, vertex left, two edges going

out of it, or, or a loop, which involves a conditional. So, , analyzing a program,

people write systems in analyse program, to look at their structure by studying

their diagrams, for example. one thing that happens often is there's unreachable

code. another, another thing you might want to do is determine whether you can

get to this exit or not, by doing this digraph processing. so that's actually a

widely used technique in, in developing software, software development. to try to

improve code by doing this kind of digraph-processing. And ofcourse these

digraphs can be huge. another classic use of depth for search in digraphs is garbage

collection that is used in systems like Java where data structures or digraphs. we

build objects and then we create references to other objects. and so the

data that any programs use is really set as a digraph. So there's the idea of

roots, so your program has e-. Some. live objects, that it can access through,

whatever state, it's in, but, a language like Java, there's, automatic garbage

collection, which means the programmer, when it's done with an object, maybe it

overwrites one of these pointers or something. there's gonna be some, blocks,

that, are not directly accessible by the program. and so, what's interesting is,

the set of reachable objects that, can be indirectly access. By the program starting

and following a chain of pointers. so those are the ones that can't be collected

or reclaimed by the system for reusing the memory. But all the other ones, the gray

ones that can't be reached by the program there's no reason to . Keep them live, you

may as well collect them and return them for use, re-use. So there's a so-called

marked and sweep out rhythm that actually dates back to 1960, where. We run DFS to

mark all ritual objects. and then go through and sweep through all possible

objects. And if it's object is unmarked it's garbage so add it to the list of free

memory. and that's a classic method that's still widely used. it uses an extra bit

per object'cause you have to have to have that for the mark. But still, it's

effective and useful digraph solution. So DFS with reachability that we've just

showed and path finding is similar. And there's a couple of other simple digraph

problems that we'll consider. These are so far examples. But it's also interesting

that DFS is the basis for solving digraph problems that are not so simple or

immediate to solve. And this was pointed out 40 years ago by Bob Tarjan in a

seminal paper that showed that. First search, can allow us to solve problems

that seem pretty complicated actually, in linear time, and we're gonna look at an

example of that later on. so that's depth for search. what about breadth for search.

Well in the same way that we saw for depth for search every undirected graph is

actually a digraph that has edges in both direction. So bfs is really a directed

graph algorithm and we can use exactly the same code to find shortest paths from a

source to any given vertex. So we use a que. We put the source on a que and mark

it as visited and. And as long as the queue is non-empty, we remove the least

recently added vertex and add to the que ue and mark as visited all the unmarked

vertices that you can get to from that vertex. And the same proof shows that BFS

computes shortest paths, the ones with the fewest number of edges from S to each

other vertex in the digraph in time proportional to in linear time. So you

want the GPS in your car, you BFS when you're driving around lower Manhattan. So,

let's look at the demo again just to see, the distinction between, breadth first

search, in digraphs and see how it works. So this is a, a small graphing, a smaller

digraphic and with six vertices and eight edges. we take a Q and we take a source

vertex and put on the Q to get started. then, Q is not empty, so remove zero and

we check all, all vertices that are adjacent that we get to. So. we're gonna,

in zero was zero distance, from zero, so first we will check two. and that one is

not marked, so we mark it and put it on the queue. and then we'll check one. and

that one's not marked, so we mark it and put it on the queue. then we're done with

zero. now queue's not empty so we pull the least recently added off, that's two. and

now we're going to check the vertices, you can get from two. I noticed both one and

two are distance one from zero. And now, since we're going from two, everything

that we encounter will be distance two from the source. So we find four, it's

distance two from the source, and we get there from vertex two. Unmarked, so we

fill in those data structures and put it on the queue. and then we're done with

two, so we go back to the queue, and 1's on the queue. So we pull one off and it's

distance one from zero. Remember the first showed that everything in the queue is one

of two distance, either k or k plus one. In this case, we've got one at distance

one, four at distance two. So now we're going to pull one off the queue. M- and

look at the edges you can get to, places you can get to from one. Now we check two

but that's already marked so we ignore it. and then we're done with one. Now four is

left on the Q so we pull it off and check adjacent vertices. In this case three,

it's unmarked so we put it on the Q. Then we're done with four. Then from three we

check five and that's unmarked and it's one more distance from the source so we

put it on the Q. And then finally. Oh we check two which we already visited so we

don't have to, to do anything. And then finally we pull five off the Q. check. Or

you get two from five and it's zero, which is marked, so we're done. and so that's

breadth-first search whig, which gives us this directed tree from the source. Which

gives the shortest path to all the vertices that you can get to from that

source. you can use a version of this to solve a more general problem known as the

multiple-source shortest paths problem. In this problem you're given a digraph and a

set of source vertices, and you want to find the shortest path from any vertix in

the set to each other vertix. So for example, in this case if the set is one,

seven, and ten, what's the shortest path to four? From one of those vertices. Well,

it turns out in this case to be seven, six to four. Shortest path to five is seven,

six, zero, five. Shortest path to twelve is ten to twelve. That's a more general

problem but it's actually easy to solve. How do we implement this? We just use a

different constructor. We just use BFS but initialize by, put all the source vertices

on the queue to get started. So that is every vertex is, so you put those on the

queue and they're zero from the desired source. And then any vertex you can get to

from any one of those is going to be. One and so forth, so the results still gives

away. The edge to array will still give a way to get from any vertex, the shortest

way to get from any vertex to each of the sour-, source vertices. here's an

application of depth-first search. Let's say you want to crawl the whole web, well,

all the web that you can access from some starting web page, say like Princeton's

starting webpage. Again, the digraph model, each vertex is a webpage, each edge

is a link on that webpage to some other webpage. and so all we want to do is get

to all the other vertices on the web. And, so solution is, well, we don't actually

build the digraph we just use an implicit digraph, because for every web page we can

find the links to other web pages on it and we'll just build those as we encounter

them. So we're gonna start with a source, which is the root web page. We're gonna

have a queue of the sites that we still need to explore. what we're going to do is

also have a set of discovered websites, this corresponds to our marked array but

since we don't know how many vertices there are on the web all we're going to do

is keep track of the ones that we've been to. so this is, don't use the vertex

indexed array we don't even bother with because we'll just use the vertex names

and do the look up as indicated we could do. So all we do is In the, is but for a

search the same method is if the queue's not empty. Take a website off of the cue.

And just add to the queue all the websites to which it links. but all of those

websites, you check whether they're in the set of the ones that you've seen already.

now, you might run out of space, for this set before you get to the whole web. but

anyway, conceptually, this is a way that you could go. one thing to think about is

why not use DFS for this. well One reason is you, you're gonna go far away in your

search for the web. Maybe, maybe that's what you want, but the real problem, in

point of fact is there's some web pages that would trap such a search by creating

new web pages and make links to'em the first time that you visit'em. So, they,

trap searches like that by, cuz DFS would always go to a new web page like that and

it'd keep creating new ones and you wouldn't get very far. With breadt-first

search you're taking a wide search of the pages that are close. and that's often

maybe what you want for such a search. and, just to how simple the idea is, this

is complete code for a, it's kind of a bare bones web crawler but it'll get to a

lot of websites. so let's look at, do this example because again it really indicates

the power of using appropriate abstractions to implement the algorithms

that we're interested in. so this one we're gonna use a cue. Queue of strings so

that's the websites that we have to still yet to go to. and then a set of strings is

gonna be the ones that we've already been to that's equivalent to the marked array.

we'll start with Princeton's website. add it to the queue of places we have to go

and also mark it. Discovered that ad just means mark it. so now, while the queue's

not empty. What we're going to do is read the raw HTML from the next website in the

queue. So this is code using our input library that gets that job done. And then,

this is a little fooling around with regular expressions. We'll talk about

algorithms like this later on, that essentially tries to find all URLs within

the text of that website. And for all of those URL's. And we'll. Look at workers

behind this code later on in this course. For all those URL's it's gonna check. If

it's marked that's discovered doc contains and if it's not marked it'll say it will

mark it and put it on the queue. a very s- simple program with a very powerful effect

that illustrates breadth-first search.