Hi folks. So now we're going to talk about another
property which is important in capturing networks, and in particular is one which
is looking at a local property of the networks.
So, in particular what's going on when we zoom in on, on given nodes and, begin to
understand the relationship between different ties in the network this is
known as clustering. And in particular, when we begin to think
about asking how dense is a network at a local level, we could ask a question of
you know, what fraction of the people who I'm friends with, are friends with each
other? And so, clustering looks at if we have a
given node i, and we look at two of i's friends j and k, what's the chance that
those two are related to each other. So what's the frequency of lengths among
the friends of i. So if we want to look at a given node i,
and ask what the clustering is for that node i, in a given network, then we can
say okay, let's look at i's neighborhood and look at all the pairs of friends that
i has. Two different k's and j's in that
neighborhood. And keep track of, for those possible
pairs, how many of them are actually connected to each other, compared to the
overall number of them. And so that gives just a, a fraction of
how many of, of your friends are friends with each other.
and then average clustering, we can just take that number and average it across
all the different nodes in the network. Okay?
So, that's a particular measure of, clustering.
And, it, there are different ways to measure clustering.
And so what we did was just do the average.
So first calculate it for a given node i, and then average across all different
nodes. And what that does, is it weights this
clustering node by node. And another way to do this, would be
instead to look at overall clustering. So look at all possible nodes and pairs
of friends that they have, and ask overall in the whole network every time
we've got a, a particular situation which looks like this, what's the chance that
it's connected and those, others are connected?
And so instead of first doing this node by node and then averaging the, this is
done overall and we're comparing out of all the possible triples in the network
where we see them connected in a, in a situation like this.
What's the frequency with which they're connected over?
So this is overall clustering. And, these numbers an be different.
So, which way you measure it, whether you're weighting it my node, or doing it
as overall possible triangles in, in the network, it's going, can possibly give
you different answers. So just as an example, let's suppose we
had a situation which looked like this. Where we have in particular a, you know,
a given node here at the center. And we keep forming the, this node has
groups of friends in three's that are all friends with each other, but aren't
friends across these different groups of three.
So we keep looking at these different groups of three, and what do we find?
In terms of average clustering, this is going to go to 10 to one.
So, for instance out of nine, node nine's friends every pair of friends that nine
has know each other. And that's true for ten as well, and
eight. So as we look at most of these nodes,
they're actually clustered at 100%. All of their pairs of friends are friends
with each other. but when we look at one, very few of
one's friends are going to actually be friends with each other.
And interestingly enough, if you began to keep adding more and more groups like
this, the number of triangles that you form in a network, a lot of the triangles
are actually going to be triangles which go through 1, and so the overall
clustering can be much much smaller than the average clustering in a network like
this. And so, you know, what you're measuring,
whether you are doing it node by node or whether you're doing it overall by
looking at possible triangles and then asking whether they are completed you can
get different answers. And so they measure different things and,
and it's important to sort of keep that keep that straight.
Now one thing that's going to be important in this setting is that when we
compare this to what happens in a, in a network uniformly at random.
If we ask what's the clustering number in a uniformly at random network, well, this
is just simply going to be p. So any time we actually look at, at a
connection like this and we ask what's the possibility of, of this link being
present? The prof, possibility of this link being
present, ignores all the rest of the information, it was just formed with sum
probability p. So the clustering is going to be p,
regardless of whether we look at average or overall we're always going to get an
answer of p for what that number is. And so if we're looking at very, very
large networks, and people have a relatively small number of friends
compared to the overall network, then p is going to be going to 0, and so
clustering in a Poisson random network, or an Erdos–Renyi random network, this
gnp kind of network, is going to go to 0 as n grows, if p is actually getting
small. which will often be the case in a lot of,
of settings we're going to be interested in.
So what that tells us is that random networks are going to tend to have very
low clustering if we're looking at uniform at random.
And then we can look at actually what we see in data.
And when we look in data across a variety of different kinds of, of data sets we
tend to see, numbers which are much higher than would have occurred at
random. So a study of prison relationships by
MacRae in 1960 clustering is about 0.31, it's about 0.01 if you do the following
calculation. Look at the same Expected degree, but
instead look at GNP model so then there's basically about 1.3% of the, of the links
are present and so your, your clustering should be 1.3 if it was uniformly random
and yet, it's 31% in the data. So that tells us that the network looks
dramatically different, then what would have happened if you'd point these links
down uniformly at random. Co-authorships 15% in math
co-authorships. Here you see that the p is extremely
tiny. These are large graphs with, with a lot
of mathematicians never having collaborated together.
.09 in biology again, so, so here you see much higher numbers than you would have
seen at random. worldwide web if you look at it without
paying attention to direction, your going to get about 11% again a much
smaller number if you don't. If you look back to our data from the
Florentine marriages, and in this case here I've included the business dealings
as well. so this is Padgett and Ansell's data from
the 1430's. here you get a clustering of about 0.46,
at random it would be at about 0.29. So that's another situation where we've
got substantially higher clustering than at random.
So this is another property of networks. This has been a more local property of
networks looking at, at how the, the links relate to each other, not just how
they're distributed over the network, and so forth.
so we've, we've, taken a look at, at a variety of, of different measures we're
going to now begin to look at putting nodes in context and, and other kinds of
things. So additional definitions that will help
us go forward in, in managing to keep track of networks, and talk about their
properties, and talk about their characteristics in a meaningful way.