The problem with social network data or actually pretty amazing, it's not really

a problem at all, is that when you think about the world that we live in especially

if we restrict to people who are active in these online social networks.

We're all not so far apart from one another in terms of this friends of

friends of friends of friends behavior.

Turns out that for most of these social networks you'll have one giant component

that most of the vertices in the graph belongs to.

And so most of the vertices in the graph will be

just some path away from another vertex in the graph.

And so if we just look at the connected components of the graph,

we won't really get a very fine description of that inner structure.

We'll just have a whole bunch of advertise's grouped in one of these giant

components, it's actually a technical term which is kind of funny.

And then a few outliers who aren't really connected to the other people.

Which is interesting and if you want to think more about drawing components it

might be the basis of an interesting project.

But for our example right now, we'd like to find those inner communities and for

the drawing components not really going to help us out.

But maybe if we think about the meaning of community as not just I'm a friend of

a friend of a friend of a friend.

But somehow capturing this fact that within a community we should have more

links to one another than to people who are outside the community.

Maybe if we can formalize that notion of being closely connected to other people in

my community, then we can write that down as a graph property that

then we could algorithmically look for in our social network.

And this was done and in fact, there's different ways of describing what it means

to have more links within a particular community than outside to it.

And so different definitions of this connectedness and

clustering behavior might lead to slightly different algorithms.

So we have a reference for you that might be really useful

as you're exploring these questions that you want to ask about social networks and

look for algorithms that might give you some of these answers.

And this book by Easley and Kleinberg has a very nice overview of

the research that's really been done in social networks and

algorithmic questions about social networks up until this point.

And one example that they present in the context of community finding

is this study that was done quite a ways back by Wayne Zachary.

And in this study, he was analyzing the relationships amongst a karate club

on a campus not UC San Diego, but on a different campus.

And noticing that for this particular karate club,

you could find a difference between the people who are represented by vertices in

white from those who are represented by vertices shaded as red.

And it turned out that there were more edges interconnecting the ones that were

shaded in white from those that were shaded in red.

And it turned out that this club really ended up having a schism.

It separated into two groups, two separate karate clubs some time after

this network was analyzed and it turned out that at the heart of the white

shaded vertices was one person who was the original founder of the club.

And at the heart of the red shaded vertices was actually the instructor in

the club, the karate instructor.

And it turned out that these two people had somewhat different visions for how

the club would carry out its activities and what the mission of the club was.

And so they each ended up splitting up into two separate clubs.

And when the researchers followed up with this community of people,

they noticed that all of the white shaded vertices actually ended up going with

the founder of the club.

And all of the red shaded vertices indeed went with the instructor with

the exception of one person.

And there was a psychological reason for that person as well.

So it's really amazing how just the graph structure of the relationships between

individual and people modeled in this

abstract way can give us insight into how the network will evolve over time.

And so what we like to do is have a description of what does it mean for

those white vertices to be grouped more and the red vertices to be grouped more.

And that definition will lead to algorithms that can figure

out those clustering.

We'll talk more about those next week.

But for now, let's think about applying that algorithm

to that UC San Diego data in 2005.

And what we see is that using that algorithm,

we can find sub communities within that big giant component

of people who seemed to have more of a clustering behavior to one another.

And then we can ask ourselves a question, what brought these people together?

And we can go back to the people on campus and ask them.

So did you go to the same high school?

So are you both engineering students?

And from the graph structure give us insight about how these people interact?

So how do we do this?

How do we answer these questions?

And at the heart of these interesting questions,

I really foundational algorithmic computer science mathematical notions.

So we're modeling the social network piece of graph.

We, as part of the algorithm, we'll do a breadth first search

which should hopefully be familiar to you from our earlier courses.

We'll be using really interesting data structures like priority queues and

queues to implement the algorithms that are necessary for

separating out a big graph into its subcommunities.

And so hopefully, you're as excited as I am about the kind of questions you can

answer about social networks using the tools that we've developed in this course.

Sorry, developed in this specialization for

courses that you can work through up until now to get to this point.

And in the next video, what I'm going to do is outline the steps that you'll take

in developing your own project throughout this capstone course.