[MUSIC] All right, so the final category of graph analytics tasks are these pattern-matching tasks. And this is where we wanna maybe spend a little bit of time, especially cuz they are somewhat relevant in the news lately, with this PRISM system. And I'll try to touch on that in a bit. Okay, soere are, the idea is to find all instances of a particular subgraph pattern. And so a very simple subgraph here is two vertices that both connect to each other, okay? And so here you may have conditions on the vertex labels or on the edge labels in order to more precisely specify the exact pattern you're looking for. But you're looking for all instances of this pattern across the graph, okay? So how many times does this pattern appear in this graph? Or what are all the instances in this graph? How can you instantiate this pattern there? Well, here is one from a to b you instantiate $x = a and $y = b, and that seems to fit. And then g to c seems to fit. $x = c and $y = g. And then actually if we're not careful, you have, ar perhaps it's what you want, but generally you don't. You'll have duplicates of this where $x can be b and $y would be a. Whoops. That will be a. And $x will be g. And $y will be c. So maybe four instances of this pattern, only two of which are in some sense unique. So popular pattern-matching problem is to find the triangles in a graph. And so a triangle is a sequence of three vertices that are connected to each other. So a connects to b, b connects to c, and c connects back to a. And so the total number of triangles is a measure of connectedness. There's lots of algorithms to compute this measure to count the number of triangles, and it's a popular challenge problem, kind of benchmark for computer scientists to compare different systems and compare different algorithms. Its utility in practice as far as actually getting information out of a graph has not been made so clear to me when I talk to people who compute this, it seems to be more of a, again, a good benchmark problem for systems as opposed to a actually informative for central network analysis. But that's not to say that it's not used ever. So for that reason I'm not gonna go into a lot of detail in the algorithms here. But a key idea to remember here is that a naive algorithm will actually find the same triangle three times, right? a connects to b, b connects to c, and c connects back to a, and then b, c, a, b, and then c, a, b, c are all logically the same triangle and that they involve the same nodes. But if you're not careful, you'll find these, all of these. And so this gives us an opportunity to come up with much faster algorithms. I think the other reason why it's studied in a lot of detail is that it's perhaps the simplest possible pattern you can find, so it's a good starting point to do more complicated pattern analysis. And in fact the pattern analysis problem in general is very difficult problem, is a very expensive problem, which leads to a bunch of approximation techniques. And in fact even for triangles, there's a variety of algorithms that don't count the exact number of triangles, but give you a good approximation with certain bounds. Okay, so this is a very rich area of research. But I think probably we'd be more interested in the pattern-matching problem in general, okay? And also maybe I'll point out that you can extend this notion to cycles that involve more than three vertices. And things change a little bit, but some of the techniques stay the same. So let's look at another pattern match example. This one's a simplified version of a real example that comes from work with some of our collaborators. Okay so here, given a graph with edge labels is the extension we're gonna make. Instead of just source node, source vertex, and target vertex, we're going to assume that there's going to be a label on that vertex as well, so a third attribute in this table, if you will. And we're gonna be looking at, what this label represents is relationships like an object, someone knows someone else. And in this case, drug X interferes with drug Y. And so that label is interferes. So you sort of know what the relationship is, instead of it being a big anonymous relationship. Or say in the case of Twitter where you assume that all the relationships are the same, it's just follows. Or in Facebook, every relationship is just friends. Or in the web, every relationship is just links to. This one, now there are multiple different relationships being coded in the same graph, okay? And so this is a slightly more realistic case when you are trying to do real data analysis. All right, so here we might say that drug X interferes with drug Y and drug Y regulates the expression of gene Z and gene Z is associated with disease W. And so now maybe one pattern that we want to look for is find all drugs that interfere with some other drug that is involved in the treatment of some disease, okay? And of course you can put other restrictions on here too, or maybe you're looking for a specific disease or you're looking for a specific drug. But now we're just looking for all patterns of this type. Okay, so pictorially the query maybe looks like this, where we want to find an edge linking two vertices, that's labeled with interferes_with, that's then connected, where one of the outgoing edges of that vertex y is connected to a vertex z through an edge labeled with regulates, and then that vertex z is connected with a vertex w through an edge labeled with associated_with, okay? And we want to find all instantiations of this pattern, all right? So I'm not gonna yet talk about algorithms to do this. I want to talk more about languages to express this pattern, and some of you with a database background may already be thinking about how you might do this in SQL, and that's one of the examples I want to show. [MUSIC]