Let's answer the first question first, about Structural small worlds.

Let's use symbol L to denote the median of shortest path between node pairs.

So, if I got node pair i,j and the shortest path distance between them is dij

and then I look at all the node pairs and the corresponding shortest path distance.

Histogram it and then I get L as the median.

And I want L to be small. And of course, L will be small if there

are only a few nodes, So really, I want to say L is small

relative to the number of nodes, say n, And, more precisely, I want L to grow like

a slow function of a number of nodes. How slow?

For example, logarithmic function, Then even for exponential growth after

taking the log, it become just linear. So what we now want to do is to reverse

engineer and we'll comment on this, the utility and pitfalls of reverse

engineering, topology or functionality of a network towards the end of the lecture.

Want to reverse engineer and build an explanatory model to explain the

observation like what has do. To explain that observation this time

about the social network that L is often a slow function the number of nodes.

But whatever explanatory of model we have, it can be just a tree growing like this

twenty to the power of six, cuz we know that violates another observation

empirically made in social network, That is there's a lot of transitivity

quantified for example by a clustering coefficient of a graph quantifying the

degree of existence of triad closures. And we're going to use C to denote this

metric clustering coefficient. I just want to highlight C clustering

coefficient of a graph has nothing to do with the clustered density p that we

talked about in the last lecture, In contagion model.

Okay? Clustering coefficient of a graph is not

the cluster, the density of a subset of nodes.

So what is clustering coefficient? Remember, we're trying to quantify the

following observation that in many social networks, if a, b, c forms a connected

triple like this, then there's a good chance that it also closes the triangle.

Okay, that's why it's called triad closure,

So, we want to look at the number of triangles versus the number of connected

triples. And, we want this metric to be normalized,

so it always lies between zero and one for all graphs.

In particular, if it is a triangle, then we want the C to be one.

In order to accomplish this normalization we're going to divide the number of

connect triples by three. Why? Because of different ways we count,

connected triples versus the count of number of triangles.

If there is a, A link here, then the triangle exists, we

call that triangle a, B, c. We count it only once.

But when we count a number of connected triples, we actually have to count a, b

and b, c, two links coexist, then there's this connected triple whether there's a

link a, c or not. If there is one, we'll also have a

triangle. If not, we don't.

But, to connect a triple, this is one connected triple.

And then, we can also write a, c and c, b, two links that is another connector

triple. And b, a link and a, c link, that's

another connector triple. So, just because when we count connected

triples, we count basically three times, so we divide it by three to take into

account that over-counting. And therefore now, c, d will be guaranteed

to lie within zero and one. For just a triangle, it will be one.

For just a connected triple, which is a line here, it will be zero.

Now the question for a bigger general graph,

What will C look like? We want C to be big, and L to be small for

our explanatory model. So, here's another graph, slightly bigger,

four nodes and four links. Let's see what is the clustering

coefficient for tri closuring this graph. Well, for this graph, this number of c is

actually easy to compute. Alright, C equals the number of triangle

over the number of connected triples over three.

How many triangles are there? Whether there's only one triangle, okay

one, two, three nodes, triangle. What about connector triples here?.

Well, there are let's see, how many connect triples.

Two, one, three, that's a connected triple.

Okay? One, two, three, that's another connector

triple. Okay? one, three, two, that's another

connector triple. One, two, four, That's another connector

triple. And three, two, four, that's another

connector triple. So, there are actually five connectoe

triples. Of course, some of them we basically

counted more than once, So, we have the normalization divided by

three. And that means the clustering coefficient

of this graph is three-fifths, which is a pretty big number considering that C must

lie between zero and one. Okay?