Welcome to Lecture eight of Networks, Friends, Money, and Bytes. Today's lecture we'll try to formulate and answer the question of, how do I influence people on Facebook and Twitter? Now, network consists of both the topology represented by the graph and functionalities which are the tasks carried out on top of the graph. In this chapter, we talk about topology dependent influence model in contrast to the population dependent and topology independent influence models in the last lecture. As most of us know very well that Facebook and Twitter have already become among the dominate communication modes, especially among younger people. Facebook started in late 2003, and by the time of spring 2012 of its IPO, it's got over 900 million users worldwide. And Twitter started much later and by 2012, got over 500 million users worldwide. And one estimate says that within a day, it processes over 250 million tweets. Now, Twitter, in particular, provides what we call a direction node link. Because you may subscribe to my tweets but I don't subscribe to yours. And indeed, Twitter's success builds on top of micro-blogging and group texting and one-sided or one direction social networking. Now, given the popularity of these services, there is no shortage of interesting questions people have raised. One question is, how do I quantify the statistical properties of opinions on these platforms? For example, a recent study were carried about Tweets of Oscar nominated movies with about twelve million tweets over the month of February 2012, show that there is a very substantial positive skew of the tweets opinion about these movies compared to other online platforms such as IMDB or Rotten Tomato. And, statistical properties of the tweets alone did not provide insightful prediction of either the box office success, or the critical claim However, if it is combined with some of these other online opinion venues, Then it does enhance the predicted the prediction accuracy. There's another question that is being addressed by a variety of companies. That is, how do I measure the influence power of individuals? For example, companies like Clout computes what it calls the clout score. It combines your influential power on platforms such as Twitter, Facebook, And other kind of blocks into a numerical score. Now, how do they measure that? There are variety of possibilities. For example, you can measure on Twitter the number of re-tweets of your tweets, okay? As we identify through a party of via. We can also look at the number of re-posting of URL's that you posted. If you have access to the topology of who follows whom, For example, I follow your tweets, and you follow Anna's tweets, and so on. Then, this typology can also give us some information about influential power of different nodes which are different Twitter users. Now, what kind of numerical scores is meaningful enough to be used? And that author remains a question mark. And that relates to the third question, How do I leverage the above knowledge of the either aggregate statistical properties of Facebook and Twitter users properties, or individual influential power source and turn these into a viral marketing campaigns. For example, maybe I'll be able to seed some users and ask them to post or tweet certain information about my product or service. And the question is, which nodes to seed? If I know the topology, would it help me to answer this question more precisely? So, these three questions are indeed important. And, as we will see, there is a significant gap between the theory and practice. Just like last lecture, this lecture offers the biggest gap between theory and practice in networking. And these questions are not entirely new either, the question of influential power of individual users that many interesting historical anecdotes. For example, the famous Revere-Dawes Ride that happened over the night of April eighteenth to nineteenth in 1775. This was the ride that helped alert the American forces to fight against the British military forces. Two individuals, Paul Revere and William Dawes, they both liked to alert the local American forces about the imminent British action. They took different paths from Boston to Lexington, and then emerged at Lexington and ride together to Concord. For example, this is the path that Revere took from Boston in Massachusetts to Lexington. And it turns out that the path took by Revere went through a lot more influential people in the neighborhood and was much more effective in alerting the local militia leaders to prepare for the British military action which led to the success of the first battle of the American Revolutionary War. So, how can we quantify that a certain path pass through more influential nodes than other paths? Another very famous anecdote is this graph. This graph is so called the Padgett's Florentine family graph which shows fifteen prominent families in Renaissance period of Florence, Italy. A few of these fifteen families aren't shown here with their names. The Medici family was widely viewed as the most influential among the fifteen families. If you look at this graph where each node is a family, and a link is a bidirectional link, representing some kind of kinship or marriage relationship. Then, how can we make intuitive understanding of this belief that the Medici family was by far the most influential? Well, if you look at a naive view, for example, how many links does each family have? Then, the Medici family had six links. That's indeed the largest number. It's got the largest degree in this graph of family relations. But there are also other families such as the Strozzi family and the Guadagni families. They also got each four neighbors, Okay? So, the degree count is not that much of a difference. And this is 50% higher than four, but was widely believed that the Medici family was a much more influential than a mere 50% differential. So, how can we quantify this intuitive understanding of Medici's family, being more important than just 50%?. There's many other stories, For example, Granovetter famous social network theorist and empiricist. He did a 1973 survey in Amherst, Massachusetts. And he discovered what he called the strength of weak ties. Turns out that, if you look at the graph of people where each node is a person and a link is some kind of family or friendship, relationship. Then, he showed that certain links which might be viewed as weak, For example, it goes all the way to another node. But there are very few of the past to connect this node to that other node as opposed to these nodes which are very closely knit together. Later, we will quantify the notion of a group in the advanced material part of the lecture, How do we quantify closely knit together idea. But, intuitively and visually, you can see that these nodes are very close together, okay? And these links reach out to some far out nodes that are not tightly knit. And these are sometimes called the weak links. And yet, they present a high strength in disseminating information such as, employment opportunities. So, how can we understand the importance and the utility of different links? In order to answer all these questions, historical ones, the current ones, we have to, represent our important quantities with some symbols. So, we start with, of course, graphs. This is already Lecture eight. So, in the past seven lectures, we have seen quite a bit of graphs. You may still remember the graph of interference channel. You may remember the graph that represents the auction, the bipartite graph of matching. And now, we're going to talk about graphs representing the topology of a network in general terms. And then, we will use matrices to algebraically represent these graphs, so that we can start running some mathematical operations. Well, in general, we say a graph, G, is a data structure that consists of two topples. First, is a set of nodes or vertices. It's the set V. And then, it's the set of links or edges, set E. Each element in this set say, indexed by I, is simply one node. And then, each element in this set is basically a two toe, i, j, represents a link from node i to node j. If we say that, i, j is not equal to j, I, then we call this graph a directed graph, okay? Because each link has a direction. Node i pointing to j doesn't mean that j points to i, okay? Whereas, if every link is actually both i to j and j to i, then we just skip writing these arrows on the line. And with implicit understanding that this is a by directional link and therefore is a graph without directional links. In all the graphs that we'll be looking at, we look at the so called simple and connected graphs. Simple, meaning that each link just connects to two nodes. And connected means that every node is connected to every other node somehow, someway. Maybe through a very long path, But nonetheless, they're all connected. We don't have a disconnected node. Like this. Although, in homework problem, we'll be, be looking at generalization into what's called the hyper graphs. Now, we're going to use some matrices to describe these graph's connectivity patterns. We'll first look at what's called the adjacency matrix represented as a bold face A. This is a matrix of dimension N by N, where N is the number of nodes. And this matrix is very easily defined. We say the ij-th entry of this matrix is one if there is a link from node i to node j, and zero otherwise. For example, if I have the following graph, okay? So, very simple graph, Nodes one, two, three, four. Then, the four by four matrix, A, is the following entries, Okay? One is connected to three. So the first row, we'll say zero, cuz there's no self loop. And then one, one, and then zero cuz one is not connected to the four. And then, the second row, we'll say, is connected to node one and three, not four. Three is connect to one, two and four, And four is only connected to three, Okay? So, this is the matrix A for this graph. As you can see, the diagonal is all zero because there's no self loops, A node does not point to itself. And it's a symmetric matrix meaning, Aij = Aji which is really an algebraic way to say that this is a graph with directional, bidirectional links. Now, if it is a direct to graph, the links have directions, then how do we define Aij then? It's defined the same way, however, the ij's may be one but ji's might be zero. For example, if we say one goes to two, two goes to three, three goes to one and four goes to three, Okay? Then, the new matrix would be the following, Okay? One goes to two, Okay? And two goes to three, Three goes to one, And four goes to three. Now, as you can see that Aij is not equal to Aji, for example, This is one but this entry is zero. A21 is not equal to A12. The second matrix that we actually use much less in this course is called incidence matrix. This is a matrix we represent by A hat of dimension N by L. N, as before, is the number of nodes whereas L is now the number of links here. So, how do we define the entries of this matrix We say A hat ij-th entry equals one if node i is on link j, and zero otherwise. So, it's a binary matrix where it's one if there's an incidence of node i and if there's incidence of link j on node i, Okay? For example, in the last slide we look at this example, One, two, three four. Now the matrix A hat will look like the following. We first have to write down also the labels for these links, okay? Let's write down these links as A, B, C and D, Okay? So, these are the one, two, three, four, these are A, B, C, D. So happens that still four by four biggest number of links equals number of nodes in this particular example. So, link A is incident on both nodes one and two. Write down one, one here and zero, zero. And B's incident on both two and three. C is incident on one and three, And D is incident on three and four. Now, what if it is a directed graph, okay? Saying the links are like this. Then, we say, the entry is minus one, if nodes i, node i stops, terminates j link. And it's one if node i starts link j, and zero if node i is not related, is not on link j, Okay? In this particular example, we see that node on link A goes from one to two. So, start at one ends at node two. And link B starts from two ends at three. C from three to one it's actually from here, from three to one should be positive. And, D is from four to three, okay? The starting point is positive one, the ending point is negative one, and zero everywhere else. As you can see, every column of incidence matrix of the rectagraph must have exactly one, one, one minus one, and the rest are zero, Precisely because all the graphs we're looking at are simple graphs. But, for each row, we could have multiple plus ones and multiple minus ones because each node might start or terminate multiple links. So, equipped with this basic set of notation of the graph and its matrices, now we can start to talk about how do we quantify the important scores of a node.