0:02

Okay, so let's talk a little bit about the distributions of degrees in a

Â network, and why, why do we care about this?

Â Well obviously, when we are trying to characterize networks in terms of basic

Â properties that they have knowing whether the most nodes have very similar degrees

Â or very different degrees can give us a lot of information.

Â So for instance you know, whether everybody has one or two links or some of

Â the nodes have eight and others have one, that's going to give us a very different

Â kind of network. And that different kind of network's

Â going to have different properties in terms of its difusion properties, its

Â path links and other kinds of things. So, you know, what we looked at in terms

Â of the GMP graph had, most nodes had order of magnitude, similar degrees to

Â each other. let's look at this in, in a little bit

Â more detail. So when we look at say these Erdos-Renyi

Â GNP random graphs. well the, the chance that a given node

Â has exactly d links, is just follows what's known as a binomial distribution.

Â Right? So the chance that I have exactly d links

Â present and d the rest of them not present, well I have chance p of a given

Â link being present raised to the dth power, then 1 minus p raised to the n

Â minus 1 minus d power. And how many different ways could I have

Â actually picked the, the other d nodes that I'm connected to?

Â well there's n minus 1 factorial over d factorial times n minus 1 minus d

Â factorial. So basically this is what's known as n

Â minus 1 choose d. that's the combinatorics of how many

Â different ways I could have exactly d neighbors out of a population of n minus

Â 1 possible other neighbors. Okay, so that, that's a binomial

Â distribution. And these networks are also known

Â sometimes as Bernoulli because you know, you're flipping coins, binomial or, or

Â Poisson random graphs. Why Poisson?

Â Well, if you want to approximate this if you want to approximate this formula for

Â large n and relatively small p. the, when you look at 1 minus p raised to

Â the n minus 1 minus d power, this looks roughly like e to the minus n minus 1, p.

Â And this combinatoric choose the expression looks roughly like n minus 1

Â to the d over d facotiral. So when you, when you bring these two

Â factors together it's roughly n minus 1 to the d that you're left with, and this

Â is well approximated by e to the minus, n minus 1 to the p.

Â So, you know, the binomial approximation here, by a Poisson distribution means

Â that we can represent approximately the probability that a given node has exactly

Â d neighbors by, this formula here which is known as a Poisson distribution.

Â Okay, so lets have a look at some of these random networks, lets, this is

Â actually a random network I drew on 50 nodes with a probability 0.02 of having

Â any given pair of nodes connected. And this is one realization that was

Â drawn in what's known as UCINET, you can find that on the web if you want a

Â network package. It's one of several that we'll talk about

Â on the course. so I drew this using UCINET.

Â random graph 50 nodes 0.02. So let's have a look at the actual degree

Â distribution, compared to a Poisson approximation.

Â So if we have a Poisson approximation, where you had an expected probability of

Â links of 0.02. Then you would get the square dots here,

Â the realized frequency is by the diamonds.

Â And actually, these two are very difficult to tell apart.

Â So the frequency distribution of how many nodes have degree 1, degree 2, degree 3

Â and so forth. There's a couple of to few nodes that,

Â that had degree two, but everywhere else we're pretty much on the graph.

Â you know, a couple things to note, out of our picture that we had, when we go back

Â here there's a bunch of isolated nodes. it looks pretty much, the rest looks

Â sort of tree like, there's no cycles in it.

Â so we see several components, no component has more then a small fraction,

Â we're just starting to see one large component emerge.

Â if we go and instead bump this probability up to 0.08 on 50 nodes, then

Â we get a much more connected graph. We still have one isolated node, and we

Â end up with y'know, more nodes having many more connections.

Â What's the actual distribution look like? Well, now when we look at the actual, the

Â Poisson approximation, we have a curve here.

Â And then when we look at the actual realized frequency, we end up with

Â something which is not exactly on target. the piece is a little larger, so the

Â approximation's not quite as great, but it's still fairly close.

Â And if you had more nodes and looked at a random graph on a larger set, then you

Â know, these things would tend to coincide even more.

Â But again, the poisson gives us a reasonable approximation of what what we

Â would have expected to be realized. so this is a poisson distribution.

Â What we'd see if we saw links formed indipendently at random.

Â 6:26

what thing, what sort of interesting is that when you look at some networks they

Â tend to have what are known as Fat tails. And one of the early looks at this in the

Â network context was by Price in 1969. And he was looking at citation networks,

Â and so, look, looking at, at papers that cited other papers, and articles that

Â cited other scientific articles, and then looking at how many sites given articles

Â have, and then you, you know, you could just put down sites at random, or you see

Â what you, you actually saw in the data. And he saw what were known as Fat tails.

Â And [COUGH], what that means is, effectively you saw too many articles

Â that had lots of citations and too many that had very few citations, so if you

Â look at the extremes you saw those being over represented.

Â And then numbers that were close to the average, you saw fewer having sort of a

Â middle amount of of citations, you either got very highly cited or very low

Â citation. this is related to a lot of other things.

Â There, there's a lot of things that have Fat tails.

Â these distributions go back to Pareto in the 1890s so things like wealth

Â distributions distribution of cities, populations around the world, words

Â usage. There's a whole series of different

Â things that have these kinds of Fat tails.

Â And in particular there's a well know paper by Albert, Jeong and Barabasi in

Â the late 1990s where what they looked at was webpages and connections to webpages

Â in the Notre Dame part of the world wide web at that point in time.

Â And what is plotted here, is a comparison between the in, in the blue are the

Â actual data of the world wide web connectedness, and in the pink color here

Â is what we would see if, if the links had been formed uniformly at random.

Â And the idea of a of a Fat tails here are the idea that when we look at the actual

Â representation, we've got, so this is log of frequency, log of degree.

Â we have a higher frequency of really high degree nodes, and a higher frequency of

Â really low degree nodes, and then the middle degree nodes are under represented

Â relative to things. So we have this Fat tails, and in fact

Â the distribution in this log log plot is almost linear.

Â so, you know, a line is a reasonably good match for what's going on in these data.

Â we'll talk about that in more detail as the course pers goes along.

Â But the important thing here, is that there are the distribution is a little

Â bit different than what we'd have gotten if we had just thrown down the lengths

Â uniformly at random. Kay, so this is known, roughly is what's

Â known as a as scale free distribution. the probability that a, given link has

Â degree d, can be thought of as proportional to the degree raised to some

Â power. And, part of the reason that this is

Â known as scale free is if we, you know, double the degree, then we, we, we scale

Â up, the relative probabilities, scale up so that.

Â the we'll still get the likelihoods, will be proportional to each other as, as we

Â are increasing degrees. And in particular if you take log of each

Â side of this what do we end up with? We end up with the log of, of the

Â probability, the frequency of having a given degree is going to be proportional

Â to, some factor times the log of the degree.

Â and so it, it, if it was fitting this, this distribution exactly, we should have

Â a line. And indeed, when we look at what we saw

Â here and, and you know, we get a reasonable approximation by a line in

Â those data. Okay, let's have a look at a completely

Â different kind of network and see whether we also see Fat tails or not.

Â So this is, Bearman, Moody, and Stovel's data, the high school romance data, that

Â we talked about just, a few moments ago. and here, we're looking at one high

Â school, nodes are colored by, gender, male, female.

Â And a link between two nodes means that they had a romantic relationship during a

Â time period that was captured in, in the surveys in, in this high school.

Â And, in particular, you know, here we see a different kind of structure.

Â We've got a bunch of, of, you know, 63 different diads, so just two people

Â connected to each other. We see one fairly large component which

Â has a couple of cycles in it, but looks almost like a tree.

Â you see some other different shapes here, a number of, of different configurations.

Â So, a bunch of small components, one larger component.

Â And actually, there's some isolated nodes that are omitted from the picture.

Â let's look at the deg, degree distribution of this one, and compare it

Â to the uniformly at random. So, here again, log of degree, log of the

Â frequency. And in this case when we look at the

Â actual data compared to the uniform at random, the fit of, if we actually try

Â and match these up, the fit in terms of matching up the data is a very close

Â match from just having. The uniform at random compared to a power

Â distribution, doesn't fit this as well, so this doesn't seem to have the fat

Â tails that the other one did. What does that tell us?

Â It tells us that degree distribution's of different networks can have different

Â properties. And if we're looking at some, they might

Â have something that looks much more uniform at random, what this tells us

Â about romance in high schools. but it it's saying that this is looking

Â much more like a a purely at random graph whereas the other looked like it had

Â something else going on in terms of its degree distribution looking more like a

Â power distribution. So, that gives us a a brief look at

Â degree distributions. We're going to be visiting those a lot

Â more in the course, they're [UNKNOWN] a very important way of characterizing a

Â network. And understanding how dense the network

Â is and also what the sort of variation is in degrees across the population.

Â so let's have a look at some other definitions that are going to be

Â important in networks.

Â