So how do we quantify the importance of nodes? One obvious choice is to use degree for example for this node with two edges coming in, three edges going out, we say the total degree is five. In degree is two, out degree three. Dunbar's number is used by sociologists as an estimate of the largest number of friends in a social network an average person can keep, Usually quoted at around 150. But we also know that degree can be misleading number to quantify node importance. For example, we may want to say that more important node is pointing to you, make you more important. And that is exactly what we did in lecture three with Google's ranking of webpages. Over there we said that construct the matrix G which is a connectivity matrix H plus rank one matrices and view that matrix as a random walking graph model for example. And then look at the dominant eigenvector. The eigenvector associated with the largest eigenvalue of this case, one of this matrix G. And now we rank web pages based on entries of this vector. That's what we did with page rank back then. And then in general, the idea of eigen vector centrality. Centrality here is just a fancier name to call importance of nodes. The eigenvector ideas says that it take the I, dominant eigenvector Okay, the eigenvector says a lot, not just eigenvalue of some meaningful matrix such as the adjacency matrix A that we just defined. So, why would you want to do that? One possible motivation comes from the following linear dynamic system interpretation. As in the homework problem, you see that we can often model many useful quantities in the dynamics of, the functionality on a graph by looking at a vector X. With one entry per node, evolved over this great time index by T, as multiplying an initialization vector X Time zero does initialization on the left by this incidence matrix A. A time T has been applied T Times. Now how do we understand this expression? We know that any given vector we can decompose that represent that as a way to sum of the eigenvectors of this matrix A. Call those eigenvectors VI and the corresponding eigenvalues scalers a lambda. And we can represent any initialization vector as the linear combination with weight CI of these eigenvectors. Now substitute this expression back into, this evolution, we see. The following expression. A multiply on the left T <i>,,, </i> the summation of CI VI vectors. Again, CI are the weighted, are the weights to provide the linear combination among the VIs to give you this vector. Because this is a linear [inaudible], we can prove the matrix multiplication inside the summation. . Now if you multiply it, the eigenvector, an eigenvector of this matrix on the left, and then you just get lunder ivi, and you have done the key times. Therefor it is just CI times lunder I, raised to the power of T times VI vector. In other words, we have expressed the time Ti I vector X as a linear combination of the eigenvector of this matrix a here. Where the weight's on the CI times lambda [inaudible] lambda I times, to the power of T. So as time, goes on. What we dominate this summation is really the one associated with lamda1. So this becomes basically like C1 times lambda one the largest eigenvalue times the power of P V1. In adverse X time T, large T is approximated by this V1. The dominant eigenvector the one source of the largest eigenvalue of matrix A. . So we can take the largest, dominant eigenvector of matrix A as the second notion of centrality. The third notion of centrality is what's called closeness. And this takes a distance point of view. Suppose we've got two nodes, I and J. And there might be many path connecting I and J. Let's say the distance between node I and j is the shortest one called as Dij. Now late and lectres nine and thirteen for socio and techno works respectively will look at how exactly can we compute hopefully distributedly these Dij's, search for the shortest path, that's not easy job. But for today let's assume that somebody had already done that. So we can tabulate these Dij's for all of the node I, j pairs. Then once we've got this list of all the dijs, we can do a few things. One is, pick the largest one. And we call that the diameter of this graph. It is the maximum dij across all dij pairs. [inaudible]. It's the largest to smallest path [inaudible]. Smallest for a given node pair, and largest across all node pairs. We can also look at the average instead of the max of these DIJs. So we can sum up the DIJs, , okay? And then, normalize, for example. We can normalize by the number of nodes. Or the number of nodes -one as, as in the convention. Now, this actually is in the opposite direction of, closeness. That says the following, if I pick a node I here Fix I. And then look at all the other nodes J, down there. Okay. And there are N minus one of them. I look at the distance from me, node I, to all the others. The average. Then if this number is, small, then I am more important. Okay? Cuz I'm, I have a very short distance, to all the other nodes. If this number is big, I am less important. So, just to make the direction work out, let's take the reciporical of this. N -one over summation DIJ over all the IJ [inaudible] for a given fixed node I. This is what people call the centrality close-, closeness centrality denoted by C for this given node I. Now if you have a high C sub I, it means that, you are, you have a very short, shortest the distance to all the other nodes, on average. Now, we didn't have to use the average. The simple mean. We can use other kind of means. For example, harmonic mean, and so on. Now, this is not the end of it. There is yet another. Centrality measure that's often used. In fact, there's quite a couple of others as well, including another in the homework problem. But today we'll stop with this fourth notion of centrality quantified by the notion of betweenness. What does that mean? This goes back to Granovetter's 1973 experiment that says, if a node is sort of along the shortest path of many other node pairs, then it has some strategic importances in the mix of the action. 'Kay. Any other no pairs, if they want to, reach each other, along a, efficient path, then, they have to pass through you. Okay. You are in between many known pairs, your on many shortest path. And this is the idea of between [inaudible]. How do we quantify this? Let's first denote [inaudible] of ST as the [inaudible], total number of shortest path between two nodes S and T, okay. Then let's denote by NST sub I as the number of shortest path that node I sits on [inaudible], the pair ST. So clearly n sup ist is less than or equal to gst, okay. Now this requires you to count for each pair st the number of shortest path. Divide the only one and note I might not be on it but you have one for gst and zero for nist. So now we want to say, well, let me look at. For each node pair ST, they may have many shortest path, so you have to normalize by GST. How many are use [inaudible] sitting on top of? Now, this expression would then have to sum up across S and T. If you sum up across both S and T indices then we're actually double counting all the node pairs. Which is fine if you want to stick to that convention. But most books stick to the following convention. You just order all the nodes. And therefore you sum across all the S, and then all the, this nacent T, this less than S in the ordering of the nodes. This will avoid double counting on node pairs. It's just effect of two difference in the end. Now we call this summation. Okay. The sum of the fraction of the shortest path that node I sits, across all the other node pairs. Between this, the node I. Bi for each given node I. So now we've got the four notions. Degree, which we know isn't hardly a useful quantity in many instances. We have the eigenvector, the largest. Dominant eigenvector from incidence matr- adjacency matrix. We have the closeness centrality, CI, for each node I, and the betweenness centrality, BI, for each node I. So we got four different matrix. Which one to use? Again, depends on what you want to do. But, first, let's illustrate this with a simple example. This is a network with just eight nodes. Okay? And we could compute the four different centronic measures for each of the eight nodes. That would take too long time. Let's focus just on two nodes, nodes one and two here. If you just look at a graph, intuitively you say, well Node one got a degree of three. Node two got a degree of three. But they shouldn't be equally important. In particular, this is like a [inaudible] apology. The graph has a left side, a right side. Visually, I can see that this is an important link between the two. And node one is part of that link. Whereas, node two is not. So node one should be more important than node two. So how do we quantify that notion? Clearly not by degree. Okay. Well, let's then look at eigenvector centrality. That means we have to compute. Matrix A here. What is matrix A? And this is a graph that doesn't have directional links, so it would be a symmetric matrix. I'll just write down the first two rows here. Clearly the first row says node one connects to two, three, and four. That's it. So it's zero, one, one, one, and then zero, zero, zero, zero. The second row, node two, connects to nodes one, four, five, so one, four, four, five, and zero, zero everywhere else.'Kay. You can fill out the rest or you can take a look at the textbook. And now all we need to do is to compute. The largest I get. Value and corresponding Igon Vector. And we see the largest Igon Value, which we're not using directly here is 2.87, and X. Is the following vector. The first two component which is what we're focusing right now, is 0.406 and 0.345 and... So, you see that node one is more important than node two now.'Kay? At least they're not equally important anymore as in degree, but it is not a whole lot bigger than node two's eigenvector centrality measures.