Welcome to Lecture Nine of Networks: Friends, Money, and Bytes. In today's lecture, we will try to formulate and answer the question, can I really reach anyone in six steps? Now, this very famous six degrees of separation, dates back many decades ago. To the belief by sociologists that in social networks of various kinds, the distance between no pairs, and between people are actually quite short, but was really codified by Milgram's experiment in 1969. Milgram asked people living in Omaha, Nebraska to send a letter to a person living in the suburb of Boston. And there is the letter, with a name,address, and occupation, and that's it. No other information provided about this person. And you probably think, yeah I need to know the name and address. What about his occupation? Turns out that it is very helpful to know the occupation, because the rule of the experiment says that, you cannot directly send a mail to the person in Boston suburb. You can only forward the mail to someone you know by first name. This is the definition of friend, in Milgram's experiment. It is the definition of a link, in this graph of social network. So you can only send it to somebody you know by first name. And then, that person can further forward this letter to someone that she knows by the first name. Until it reaches somebody who says, yeah I know this person by first name. It was a very interesting experiment. And all together 296 people were asked to participate. It turns out that 217 of them actually send out the letter. And out of those, 64 were completed. Meaning that it reached the final destination. I think this is kind of a less than 30% completion rate. Well, indeed, we have to properly take into account in our statistical analysis, what happened to the other 70% of the participants. But among these 64, you see something very interesting, okay. And also, in later replica of this experiment, such as by email in the 1990's. Actually the completion rate was only 1.5%, you know much lower. So if you look at the 65 list of letters that completed, and then count the number of hops it takes to reach this final destination. And then you look at histogram of that list of numbers, you see actually it's quite a small number. The median length of the search chain. This is a search chain, it's a chain of a social search. And the median length out of the 64, is only six. And this is the magic number that leads to the so-called six degrees of separation. And it says, roughly speaking. And as we'll see, very roughly speaking. In this entire social network we live in, called the Earth, we can reach anyone in six steps or less. Well first of all, rigorously speaking, this is only the median length, okay? It is not the maximum length. There are very long chains. You can only say that, roughly speaking, it is quite unlikely that you can complete the search team, in a few number of steps. Okay, and the median is six. There was so many extensions of this basic result, [COUGH] that it would take actually a whole course to do nothing but to tabulate all those. For example, people have looked at the network of scientists, the network of movie actors and actress. Here, a link is a co-authorship relationship. Here is a co-starring relationship, and the list goes on, and on. And, to quantify the degrees of separation, and usually it's a pretty small number, even though the overall population is huge. And in 11, November, it was reported that Facebook has a four point seven four degrees of separation, which is even smaller. So the question was raised, is Facebook flattening our social world further, bringing six degrees of separation down to less than five. We'll later come back to this four point seven four. But first of all, the question we have to raise is, what is the definition of link on Facebook? As we mentioned last time, does that mean that you are in my friendship circle? Does it mean that we have mutual wall posting? Or does it mean that we have a regular enough pattern of communication? All right, later we'll come back to this four point seven four even further. So here's the histogram of the search chain in Milgram's experiment. Some chains were very short. Two people actually, so happened they know directly the final destination's person, and some chains took ten steps. And you have to think that, out of the 70% or so that did not even complete, some of them were because the chain was so long as it drags on, there's some chance of it getting lost along the way. But the mean was five point two, the median was six. Turns out also to be the mode. And you'd wonder, what kind of chain does it look like? I was drawing a chain, but that was hardly a representative one. Here is a representative one. Here's a source, somebody living in Omaha, Nebraska, here is the final destination in a Boston suburb. And usually it goes through some short range lengths, okay. And here, by short range, I mean physical distance, short range. Later we'll also see, there's also social distance of various kinds. And then, you will usually use one, or sometimes two, so called long range lengths. In the physical sense, this can go, say, all the way to New York, for example. And then, will through some further local chains, before reaching the destination. This long range link, is a very interesting phenomenon. We'll soon discover that, without these long range links, it will be very difficult to have a small world. Now, how would you make sense out of this six degree of separation? Is that surprising or not? Our first thought, you might say, only six steps? I can reach anyone in six steps? Well, that's kind of surprising. I thought I have to take many more steps in this world of many people. So this is a surprising result, but then on second thought you say, wait a minute. If I say have only 20 friends, not a very popular person per se, but 20 friends. And each of them has also 20 friends, again not hugely popular. I know if you look at Facebook, as an example, a lot of people have thousands, tens of thousands of friends, even though most of them really cannot qualify as a link in Milgram's example, but still 20 is not a huge number, Dunbar's number is 150, right? So assume that all these friends of mine, I only have 20 friends. Then in six steps, I can reach 20 to the power of 6 which is about 64 million people, so no wonder with 50% chance, six steps suffices. Six steps or less, because even for this kind of configuration, I can already reach over 50 million people. So, what's the big deal? Plus 6 steps suffices. There's a lot of people that I can reach, which is not surprising. And then on the third thought, say hold on a second. There's a lot of overlap, okay. Here's me and I got 20 friends, all right. And I was thinking grows like a tree, each of them got 20 friends. But the thing is, that a lot of my friends' friends are also directly my own friends. There's a lot of duplication here. I cannot reach out to all another set of 20 new people from each of my friends. In fact, maybe all his friends are mutual friends too. So, this tree does not grow, like what we just mentioned, 20 to the power of 6, then what? Well we'll have to think about this phenomena, or try to model the homophily in a social network, refers to the fact that a lot of times like minded people, they tend to know each other. So my friends friends tend to be my friends directly. And this is a specific notion of homophily called the transitivity in the social network. Remember, this transitivity refers to a property of a graph. It has nothing to do with the transitivity in voting theory that we talked about back in lecture six. And one way to further quantify this notion of transitivity, is look at a specific case of that called triad closure, as you can see from tri, means a triangle, a three-way closure. And to quantify the amount of triad closures in a given graph through a single scale or number. The metric of clustering coefficient. So this quantifies triad closure, which is a special case of transitivity measure which is one model of homophily in a social network, and that will explain this phenomena of a lot of overlaps. So whatever explanatory model we come up with for the six degrees of separation, it can't be this tree growing 20 to the power of 6. It has to be something that also observed homophily or transitivity. So here is an example of a triad closure. We've got Alice, Bob, and Chris, okay. So Chris knows two friends, Alice and Bob. Then we say, if there is also a link, this blue line here, that Alice and Bob also know each other, let's assume that's all links are bi-directional here, then we say the triangle is closed with this link. This is what we call the triad closure. So we wonder if I count all the connected triples, this is a connected triple okay, in the graph I'm given. How many of them actually close the loop and forms a tri closure? Not all of them, but some of them, and it's believed that in human social networks, there is a lot of closures, so I would like to basically count the number of closed triangles as a percentage of the number of connected triples, whether they are closed or not. It turns out this will not be exactly the metric we use the formula because it doesn't normalize to what we want, but it captures the essence of the idea. If this is a big number, then we say, we indeed observe a lot of tri closures, and therefore transitivity in social network. So our explanatory model, what we come up with as a model to explain the existence of short path, got to also observe a high clustering coefficient something like this. So that remains a mystery at this point in the lecture. It will later resolve it, but it is surprising. Now we see finally the first true surprise of six degrees. Not because they are short path, but because they're short path despite having transitivity in the network. And yet there is one more thought, that is the second and the true surprise, what we have talked so far is so-called structural small worlds. It says that the median of the shortest to search path are small numbers, despite some short search, change exist, despite, Triad closures are more generally despite transitivity. That's called structural, okay? It does not talk about the functional properties. Remember all the way back to lecture two when we talked about structural properties based on the graptheratic notion of a network vs functional models which are the functionalities running on top of the graph. For example the hyper text connected web pages, that gives us the graph and associated matrices and the structure. Then when we start adding things like random navigation, then we are starting to model functionality, the functionality of web browsing behavior. And we, in the last two lectures have also seen purely grapheratic notions, plus functionality based. For example, in to equation are dynamic systems on top of the graph, and in today's and in the next lecture, we will indeed talk about the interactions, both positive and negative ones between structural and the functional properties of a network. In particular in today's case, we have just talked about the structural small world. But there's something more, something much more, which is algorithmic small worlds. That's the true surprise. What is algorithmic small worlds? Now remember in Milgram's example, he wasn't just trying to find out the existence of short path. He's saying that this short path can be discovered. By very limited local information. Right, you are standing in the path here. And you really have no idea what the over all social network looks like. So you have to decide which among your friends you're going to forward this letter to. Okay, friends are the people that you know by first name. You will look at the final destination. The name, probably tell you something. It probably tells you the sex and even the ethnicity of the destination. The occupation also tells you something. For example, in Merriam's case, it's a stock broker. So you may think, maybe a banker is close enough to this occupation. I'll try to look for my banker friends, okay, as opposed to say a nurse. And of course, the physical address also tells you something. You probably want to forward to somebody closer to Boston rather than going out to the west, to Los Angeles, for example. In other words, when you are deciding which one to forward to, you are solving some local optimization based on certain metric, and you pick one. Let's say with the, likely, the highest likelihood of getting closer to the destination. And there are two issues here. One is, what metric to use, we just mentioned three of them. Probably, will be a composite metric, some combination of all of them with different ways in your mind. And second question is, should I pick the one which has perhaps the best composite score towards their destination? If so, then it's called the greedy social search. There are other alternatives, for example, I may say, you know what? I'll just pick a friend of mine who is the most popular person. So even though I may not know this person, I trust this one have so many friends, she may actually know this person. If we follow that one, we'll soon reach a hub of the network connectivity. But in most models of algorithmic small world, people assumed that a participants follow a greedy strategy, okay? First come up with a composite score based on these metric distances, some are easy like address, physical distance, some are very difficult like occupational distances. But whatever you use to quantify, come up with a composite score, then follow the greedy method to form with a letter. All right, now you've got a routing algorithm actually. Later in lecture 13, we will talk about routing algorithms for machines to follow in the IP, on the internet protocol. But in the social network story today, this is a greedy routing algorithm. You have limited information, only locally available to yourself, and then you make a local greedy decision. You can see, this is a pretty sub-optimal likely way to do the forwarding. And yet, you can still find a short chain. So it is the discoverability of some short chain, may not be the shortest chain but a reasonably short chain. That is truly amazing. Not just the existence, there is a short path but also some short path whether that one are close enough, approximation can be discovered by low code interactions, by local decisions. That is what we call algorithmic small worlds. Fix a routing strategy like this really, greedy one and yet you can discover short path. How can we reverse engineer that phenomenon which was observe in Merriam's example. That would be a second task. So first task, second task for the rest of this lecture.