0:00

Hi folks, so now we are going to talk about

models of networks that allow us to capture interdependencies.

And these are often known as exponential random graph models.

So, we went through these different forms of

models, last we talked about stochastic block models.

And now we are going to talk about the popular set

of models which are known as exponential random graph models.

And we are also going to talk about some variations

on these kinds of models, some new types of models.

Which are easier

to estimate than exponential random graph models.

And, so I'll talk to that, after we get to, to the basics.

Okay.

So first of all, a quote from Jacob Jevy Moreno and Helen Hall Jennings from 1938.

And this sort of, you know, gives us the idea that when we're looking at

situations with, we're looking at social interactions and people

are interrelated.

We're not going to be able to look at just binary relationships diads.

We're going to have to be looking at bigger configurations.

And, so that is something that is driven home by their original, quote here.

That a pertinent form of statistical treatment is one which deals with social

configurations as wholes, and not single series

of facts, more or less artificially separated

from the total picture.

And a very interesting thing here, I pulled a, a picture from Moreno in 1932.

So, actually the Moreno is also known as the father of sociometry.

He's a social psychologist, and here, he was actually mapping

out ties between individuals in different houses, in New York.

He was at, at Columbia University in 1930s.

And this was one of

the first sociograms or situations where we've actually

mapped out a network in a graphic form

of a social interactions between individuals and one

of the first ones that I know of.

Okay, so, so lets talk about this class of models.

So, the previous models we have talked about are not great

at fitting data with lots of clustering and other kinds of dependencies.

And in particular, in testing many social and economic kinds of theories,

which will give us reasons for which people might

be, interacting with each other in certain particular, forms.

And, so the idea here is that you know, the link between two

individuals i and j could depend on the presence of a friend in common.

Does, j know k and i know k as well?

That's something we want to capture.

The difficulty with this is, once we allow one link

to depend on another link, then we open a Pandora's

box, where now all the links could be inter-dependent.

So if, if i and j dependent whether they

have a friend in common and the, that friend

depends on whether they have other friends in common

and also depending on whether i and j are there.

Everything gets put back together and so

we have to specify all of the interdependencies.

And Frank and Strauss in 1986 and they started

working with a class of models which became known

as p* models and, and were most of them imported into the social networks,

literature and, and force by Wasserman and, and Pattison in the 1990s.

3:11

And have, have since become known as exponential random graph models.

And let's start by just going through an example of this.

And, and this is sort of the simplest possible complication that we could have

to given, model where instead we had links before.

Now what we're going to do is allow the probability of the network to depend not

only on the number of links present, but

also depend on the number of triangles present.

Okay, so we could say that links with

triangles are more likely to appear than links

3:41

networks without triangles, okay?

So we're just going to complicate things on one very simple dimension.

But one that actually turns out be quite powerful in,

in terms of enriching the picture of, that we have.

And in particular, this idea that we could have that,

a link depends on whether you have friends in common.

That's going to, to give possible precedence to more triangles

than would appear in something like a Erdos-Renyi random graph.

And so, what's the idea behind an exponential random graph model.

Now we

have the, the probability depends on the number of

links we have, the number of triangles times some parameters.

So, if we set this parameter to zero, then then that would

zero this out and, and all,a ll that would matter was links.

But if that's not zero, then we have a situation, where now we have

both links and triangles mattering. In order to make

things into probabilities.

What we're going to do is then, make sure that this thing turns out to be something

between what has to, have to be non-negative and lie between zero and one.

And so the first thing we can do is instead

of just having a probability be proportional directly to this.

We'll exponentiate that thing, so it's always going to be non-negative.

This is a standard trick

in, in statistics, work with the exponential family.

So now we have something which is non non-negative

or generally going to be positive, so the probability of a

graph is proportional to some exponential function of something

about the number of links and the number of triangles.

5:11

And one very powerful theorem by Hammersly and Clifford show that

pretty much any network model could be expressed in the exponential family,

with some counts of graph statistics.

So it might not be links and triangles,

it might be links, triangles and numbers of three

prong stars you have, it could be that it

depends on the number of cliques of size six.

It could be of quite complicated list of statistics.

But the theorem that they had said that pretty much any network

model that you can think of could be expressed in this form.

And just as a

check on this, let's go back to our

Erdos-Renyi random network model and see how this works.

So, let's let p be the probability of a link.

L of g is the number of links in a given graph.

What's the probability of g in that world? Well, under the Erdos-Renyi the chance

that you had exactly this network full of links, is the probability p that all of

the links that are present formed and 1 minus p

that all of the links that did not form didn't form.

Okay.

So, that would be the probability of

getting exactly a particular network with L links.

6:28

Then let's rewrite this, so we'll pull out the L.

So, I'm just repackaging this.

So that we can write this as p over 1 minus p raised to the

L, of g times 1 minus p to the n times n minus 1 over 2.

And, let's write this now in an exponential form.

So another way to write this is, this is the exponent of the log of p over

1 minus p times L of g minus some extra

term which doesn't involve the number of links at all.

So, what does this look like?

This looks like an exponential function of a statistic of the graph,

where in particular, this statistic is the number of links in the graph.

So you can write the probability of an Erdos-Renyi graph

as in this form, where now this looks like log

of p over 1 minus p. Okay.

So we can see that, I mean, this is

not a proof of the Hammersly-Clifford theorem by any means.

But what it does give us an idea that you can convert

a lot of other kinds of models into the exponential random graph family.

And that's going to be quite useful.

7:37

Now one other thing to, to make this a

probability of course the, these have to sum to one.

So, in particular that means that we have to normalize

by what the probability of all the graphs are, to make

sure that the probability of one particular graph when we sum

across all the graphs, this is going to sum up to one.

So the denominator of this probability is going to have to be the

sum across all possible graphs of what the probability of those, relative probability

of those graphs would be.

So now, if we sum this thing across all graphs, the

sum of now the P of g's across all g primes,

this thing will now equal to 1 because we get the

sum to be the same on the top and the bottom.

Okay.

And also you can write this we know we

can bring this denominator which doesn't depend on g anymore.

Into the numerator as minus

some constant it's just going to be

the corresponding constant which captures its denominator.

So, what we've got is a, an exponential random graph

family, which allows us to capture different types of statistics.

And, that's going to be quite powerful in allowing us to

fit a lot of things and incorporate a lot of things.

The main challenge is going to be in estimating these

kinds of models.

So now next, in the next video, I'll talk a little bit about how

to work with these models to estimate things and to begin to fit these.