Hi, in this set of lectures we are talking about networks. In particular, we are
focusing on three things, we're talking about the logic of networks, which is how
they form, going to talk about the structure of networks, which is how
connected are nodes from one another, how far is it from one another to another node
and finally we're going to talk about the functions of network. What functionality
does a network have? Maybe one built-in, like sort of emerged from the structure.
Now in this lecture, we're gonna talk about the structure of the network. So
we're gonna define a bunch of terms, find where the network is in terms of nodes and
edges and we'll just define and describe a bunch of measures in network that help us.
For a better understanding of how networks differ from one another. So when you think
of a network, you can think of a bunch of things that we're gonna call nodes, which
are points, and then lines connecting those, and those are gonna be edges. So a
network is really just a set of nodes and a set of edges. Now those edges can be
either directed, and that means there's like a little arrow, or they can be
undirected where it's just a straight line. So an undirected node. The network
would be something like. [sound]. So an undirected network would be something like
the 50 United States. So each node would be a state, and then an edge would just be
whether those two states share a boundary with a common border. So there's no
direction there. If Ohio's connected to Michigan, then Michigan is connected to
Ohio. In a directed network, then that little arrow that you see, that means
that. One node points to another node, but it might not necessarily go in the other
direction. So if you take a college campus, and you look, you ask whether one
student looks to another student for fashion advice. It may very well be that
person A looks to person B, but person B doesn't look to A. So in a directed
network, you could have an arrow going in one direction, or you could have arrows
going in both directions. So a lot of social networks are directed. But many
networks, the networks were gonna focus on are gonna be undirected networks, just
because they're easier to work with. So then you've got structure we're gonna
focus on four measures: degree, which is how many edges each node has on edges,
path length, which is how far it is from each node to another node. Connectedness,
which is whether the entire graph is connected to itself. And then, finally,
clustering coefficient, which will give us some understanding of, like, how tightly
clustered are the, are the edges? So if A is connected to B, and B is connected to
C, is A connected to C? Okay, so let's get started. Degree of the easiest one. If you
take a node, the degree is just the number of edges connected to that node. So if a
node looks like this, its degree is one. If a node looks like this, it's degree is
four. So it's just how many, if you're a social network, it's how many friends do
you have? If you're a business, and you're looking at, and each edge represents
working [inaudible]. Another firm and it's a case sort of how many other firms do you
work with, so degree is just really a sense of how connected you are to other
parts of the node. The degree of a network is the average degree across all nodes. So
if I take the entire network and I add, count up the degree of every single node
and than average by the number of nodes, that's what we'll think of as the degree
of the network. So a high degree network will be one where lots of people are
connected to lots of other people. A low degree network will be one where there's
very few connections between people. So here's a network, right here. And if I
look at this green node, I can ask what is it's degree. And you see that it's
connected to one, two, three people. So the degree of the green node. Is equal to
three. Now, if I wanna figure out the degree of the entire graph you can
[inaudible] I'm gonna go through and count up each person. So, this person has a
degree of one. This person has a degree of one. This person has a degree of three.
And I can go through and count all the way up. Now, that seems like that takes a lot
of time. There should be a better way to. Figure out the average degree for a
network, and in fact there is. Cause all I have to do is count the nodes. So, in this
case, there's one, two, three, four, five, six, seven, eight, nine, ten nodes, and
then I can just count the number of edges. There's one, two, three, four, five, six,
seven, eight, nine edges. So, ten nodes, nine edges. Now each edge connects two
nodes. So, what I've gotta do is just multiply those edges by two. So if I have
two, two, two times the number of edges divided by the number of nodes, that'll
give me my average degree. So, in this case two times nine is eighteen, divided
by ten gives me an average degree of. 1.8, now when you think of moving along those
edges, you can call the people you're connected to, your neighbors. So if this
is a node here, and he's connected to these three people, then these three
people here are the neighbors of the node. And what we can think about then is.
People who are connected have more neighbors, and later on, we're gonna
extend this concept to neighbors from sort of your immediate neighbors, to neighbors
who are further away on the graph. Well, here's an interesting result. So we only
introduced one measure, which is degree. And now we're gonna get a really, sort of
fascinating result. And that is, the average degree of neighbors of nodes will
be at least as large as the average degree of the network. So, typically, it'll be
larger, but it has to be at least as large. What does this mean in less
technical terms? It mean, most people's friends are more popular than they are. So
literally, it means that. But if you look at your friends, they, on average, have
more friends than you have. But what does C have [inaudible]. So let's look at this
network. A. Has a degree of two. B has a degree of three. C has a degree of two.
And D has a degree of one. So if we add all that up, we get eight, divided by four
gives us two. Now we also could have done this by just counting up the edges which
are four, multiplying that by two, and dividing by the number of nodes, and that
would also give us two. So what we get is we get on average, people have two
friends. Well now we can ask what about friends of friends? Well, let's start with
D. D has one friend. And that's B, and B has three friends. Well let's, let's walk
through the whole thing and see how this works. Let's start, again we start with D,
they have one friend, which is B, who has three friends. C has two friends, A and B.
A has two friends, B has three friends, that's an average of 2.5. If we look at B,
B has three friends. A, which has two, C, which has two, D has one, that's 1.66. And
if we look at A, A has two friends, B and C and they also have an average of 2.5.
So, if I add all this up, what I end up getting is that this is five, eight, 9.66.
Divided by four, which is, you know, approximately 2.4. So, people's friends of
friends have 2.4 friends, whereas people on them, by, on themselves only have two
friends. What you see is your friends have more friends than you do. That's just a
fact about networks. Next measurement we want to get to. Path length. The path
lengths from A to B is just the minimal number of edges you have to traverse to
get from A, person A to person B, from node A to node B. So, I've gotta node here
and a node here, but there's no direct path. Get to those two, you have to take
one, two edges, and their path length is two. So, in this particular graph, here's
this green node. To get to this green node, I've gotta take one, two paths, two,
two edges. So the path length is two. We can also then compute the average path
length for an entire graph. So now you take all pairs of nodes, and figure out
the path length. So let's take this graph. Let's think of, there's, what are there?
There's A and B, there's A to C. A to D, B to C. B to D, and C to D. Those are the
different we could get. So, A and B have a distance of one, A and C have a distance
of one, A and D have a distance of one too. B and C have a distance of one. B and
D have a distance of one. And C and D have a distance of two. So when I add all this
up, I get one, two, four, five, six, eight, and there's six total. So that
means the average path length is one and one-third. That's a very short average
path length. In that particular graph, it was connected, which is our next measure.
A connected is just a graph where you can get from any node to any other by walking
along edges. So this would be, again, a picture of a connected graph, 'cause you
can get from any node to any other node just by walking across the edges. There's
also disconnected graphs. So here's a, a graph of, where there's six nodes and you
can't get from this set of nodes to this set of nodes. So it's disconnected. So in,
in a graph like this, it's interesting because if it's not connected, any
information that's over here with A, D, B, and E, might not get over to C and F. So
disconnected nodes, information's not gonna flow between them, which could be a
bad thing. Now if you think about disease, this could be a good thing because if
there's a new disease in this population, it's not gonna cross this barrier and get
over to this population. So depending on what you're concerned with, connectedness
could be good or bad. Here's an interesting assignment. We've talked about
mark-off processes a couple times. We can think of a mark-off process like a
network. In fact our pictures sort of look like networks. So you can think of there
being here's four states of a mark-off process and then I can just draw little
arrows showing the probability of movement from one state to another. Now this is a
directed graph, and you could have, you could be that it's close as you can get
from B to A with property 0.3 and we can just throw that in. So, one way to
represent a mark-off process is with a network because you can just write down
the states and then draw little directed arrows showing the probability from moving
from one. One state to the next. So again, you start seeing all of our models get
related to one another in interesting ways. Last measure, clustering
coefficient. This is gonna be the percentage of triples of nodes that have
edges between all three nodes. So if I think of three nodes A, B, and C, it could
be that they're connected like this, and then we don't have the full triple. Or it
could be, that A is connected to B. B's connected to C and C's connected to A. You
get the full triple so we can fill in the triangle. You can ask, what percentage in
the possible triangles can you fill in? Let's go back and think of a simple graph.
So here's a graph, and you can ask, as I start drawing lines, what happens to the
clustering coefficient? So if I draw these three lines, I can ask, how many triangles
could there possibly be? And there could be four. There could be this triangle one,
this triangle two, this triangle three, and this triangle four. But what we see
when we look at this graph is that none of those triangles are filled in. So this is
gonna have a clustering coefficient of zero. What about this graph? Same number
of edges but now we've got one triangle to fill in so this is going to have a
clustering coefficient of one over four. What about this graph? I've got one
triangle here, and one triangle here, so this is going to have a clustering
coefficient of two over four. And then finally, if I have a completely connected
graph, I've got, this triangle's filled in, this triangle's filled in, this
triangle's filled in, this triangle's filled in, so I have four over four, which
is a clustering coefficient of one. So, if you. Think about that, alright? You can
ask all sorts of really interesting questions. So here's a simple quiz
question you might ask. Draw two Connected graphs each that has six nodes and six
edges, but have this different cluster coefficients. Now the reason we ask an
exercise like this is to get across the idea that, number of edges sort of tell
you how connected the graph is but it doesn't tell you how clustered it is. So
it's possible to have, two different graphs with the same numbers of nodes same
number of edges but different clustering coefficients. So here is graph one, and if
we look at it there's only one triangle filled in. So this clustering coefficient
is gonna be one. Over the number of triangles. Well how many triangles are
that we get. One, two, three, four, six nodes and we ask a triangle consists of
three. That means there's six nodes that we can pick first, five we can pick
second, four we can pick third, but the order doesn't matter. Three times two
times one doesn't matter. So that means we get twenty triangles. So this [inaudible]
one over twenty. Well here's another graph with six nodes and six edges but none of
the triangles [inaudible]. So this is going to have a coefficient of zero over
twenty. So two graphs. Same number of nodes, same number of edges, different
clustering coefficient. So what have we learned? We've learned that there's
structure to graphs. We could measure degree which is on average how many nodes
is another node connected to. We could talk about path length, which is how far
is it from one node to another node. We could talk about connectedness, is the
whole graph connected. And we can talk about clustering coefficient, which is how
many triangles, of the possible triangles, how many of those are filled in. Now these
are statistical measures. So let's talk about, why would we care about these? Why
might these be important? Well, think about degree, what is it telling you? It's
telling you the density of connections, in a way. It's telling you just how connected
is this graph? So you can think of this as some sort of proxy, possibly, for social
capital. So if you looked at a community, and there were very few connections
between people, then not many people know other people. So remember we talked about
Bob Solo saying social capital can't be just this abstract thing. We need some
measures. One measure could be, how connected are people to one another?
What's the average degree of the social network? You can also think about speed of
diffusion. If there's a piece of information up there, how fast is it going
to spread in this network? Well, one thing that'll determine that is how connected
people are. How many nodes people are connected to. So, somebody's connected to
100 people and you tell them something and they tell all 100, it's gonna spread
really fast. If I'm only connected to two people and you tell me something, then
that information is gonna spread fairly slowly. What about path length? Why do we
care about path length graph? Well think of an airline graph. It's gonna tell you
the number of flights your gonna need on average. So if you wanted to fly from
Miami to L.A. And the average path length was 1.1 for the airline network graph,
you'd know that you?re probably gonna get a direct flight. If the average path
length was seven then you'd know oh my goodness this is gonna take me a lot of
flights to get from point a to point b. It also tells you social distance. So suppose
you run an organization, and you've got a network. And you know that the average
path link between employees is six. That means randomly picking two employees, it's
gonna take six other employees, or five other employees to get from employee A to
employee B. That means that information knowledge, ideas, are not gonna spread
very fast within your organization. And you may wanna try and figure out ways to
reduce that path link. And last, the likelihood of information spreading. If
the path length is really long, it's just not that likely that information is gonna
make it nine steps, ten steps, eleven steps, if people are that disconnected.
What about connectedness, our next thing? Well, connectedness for Markov processes,
it matters. [inaudible] Markov process, we had to be able to get from any state to
any other state. If the nodes are the states, and it's not connected, then we
can't apply the Markov convergence theorem, because that was one of our
assumptions. You can get from any state to any other. If you think about, the
abilities of a terrorist group. You might still think all these people are really
upset, with, the world in some way. And, and that sort of a terrorist organization
is people who wanna do, curiosum tasks that [inaudible] we don't want them to
carry out. If their network is not connected. Then it's going to be difficult
for them to carry out these activities. If it is connected, then they can pass
information and can possibly carry things out. So things like path-link degree in
connectedness are going to determine the functionalities of organizations that do
good as well as organizations that do bad. Now, also connecting [inaudible] measures
if you think about things like the internet or power failures, the electric
grid. If the electric [inaudible] becomes disconnected, you disconnect from the grid
and you don't have any power. And finally, living in terms of information, if people
are disconnected from other people there gonna be isolated in terms of information.
So they may not learn things, they may not figure things out. And then finally the
clustering coefficient. Class stream coefficient. Can be used to figure out how
redundant or how robust a network is. So if you pulled one person out. So think
about it, if you've got a little triangle here between A, B, and C. If A gets wiped
out for some reason, B and C are still connected. So, it gives you a sense of how
robust or how redundant the network is. This is also sometimes used as a measure
of social capital, cause it means there's these, these tight links between people.
And one last really interesting thing about The custom co-efficient can be used
to capture how likely an innovation is to be adopted, in particular possibly a new
word. So if I've got three people that are all connected, A, B and C. And A uses some
new word. And B hears it and then tells it to C. And then C uses the same new word,
and A hears it. You get this sort of autocatalytic set where the information
get passed through the loop, and it now seems part of the new normal. A says,
well, I said this, now I heard C say it, so this must be something that people say.
Whereas, if A wasn't connected to C, when A said something new, it could just sort
of fan out, and it may never come back to A. And so A may be less likely to maintain
the innovation. So these clusters [inaudible] creating feedbacks can allow
things to be maintained within a network. One last thing. [laugh] And I've used all
these measures there's still a sense that networks that a picture is worth a
thousand words. And one reason I think for the rise of networks so much is because of
the fact that we now can graph these networks in really interesting ways. So
let me show you the power of pictures. Here are three graphs. The first one is
the 50 US states, and there's an edge, those are the nodes. And there's an edge
if they're adjacent. The second one is a group of 48 universities. And then it
shows, there's a, an edge between them if they played football against each other,
[inaudible] American universities. And then the last one is an airline, so each
node is an airport. And [inaudible], a network edge is just gonna be whether the
airline flies from that one network to another one. Here are the statistics on
those three graphs. So graph one has a degree of 10.8, a path length of four.
Graph two has a degree of nine, a path length of six. Graph three has a degree of
ten and a pass length of four. So they're all about the same. So look at these
statistics. There's not a huge difference between the three networks. Let me show
you the pictures. Here's graph one. What do you think this is? Well, this is
clearly an airline network. There's a couple hubs here, and there's all these
cities that the airline flies into. Here's graph two, what is this? Well, this is the
50 United States. Up here are Alaska and Hawaii, who are disconnected. And it
looks, in fact, a lot like the United States. And then here is graph three.
These are the football conferences, one, two, and three. And what you can see is
the teams all playing each other within conference, with an occasional game
outside the conference. So what we see is we see very different structures between
the airline network, the states. And the University's playing football, even though
the statistics, the first two statistics, degree and path link were about the same.
Now, where we would see a difference statistically is in clustering
coefficient. So, if we look at this graph, the clustering coefficient is going to be
very low, in fact it's going to be zero almost. There's very few triangles in this
graph. In graph two, there's a little bit of a clustering coefficient, but in graph
three the clustering coefficient is really high. So the sense in which, maybe instead
of saying a picture's worth a thousand words, we might want to say a picture's
worth at least three statistics, or maybe four statistics. Cuz you need quite a few
statistics. Statistics before you can really start distinguishing between
graphs. So degree won't be enough, path link won't be enough, but if you throw in
clustering coefficient, then maybe you start making more sense of the graph.
Okay, so that's structure. We now understand, we can think of networks as
having nodes, edges, degrees, path length, and connected in a [inaudible] cluster
coefficient. So this is a language for describing different network structures.
What we're gonna do next, is start with the logic which these networks form.
Alright? Thanks.