Hello. This lesson will be dedicated to details.

Teeny, tiny details can make your life happy or miserable.

So watch and learn.

In this video, we examine how triangle count method of graph frames works.

Remember that triangle count method computes

the number of triangles passing through each vertex.

First of all, this algorithm ignores edge direction.

That is, all the edges are treated as undirected.

In case there is an edge for vertex A to vertex B,

and an edge for vertex B to vertex A duplicated edges will be counted only once.

How can you get it?

By flipping edges in a way that source of each edge is

less than destination and dropping all duplicate afterwards.

Let's take two example and see how this strategy works on it.

In a short example,

data frame will contain edges (1,2) (2,1)

(1,3) (3,1) (1,4) and all of the sudden (4,3).

Let's go through it row by row and flip all the edges

where condition source is less than destination, is not true.

The edge (1,2) is correct,

so it won't change.

In the edge (2,1) source is bigger than destination,

so source and destination will be flipped and edge (2,1) will become an edge (1,2).

The edge (1,3) is correct,

one is less than three.

The edge (3,1) is not correct cause its destination 1 is smaller than source 3.

So, let's flip them and get in you correct edge (1,3).

The edge (1,4) is correct.

And finally, in the edge (4,3) source and destination should

be flipped and edge (4,3) will be transformed to edge (3,4).

So, then your data frame is all edges flipped,

will be (1,2) (1,2) (1,3) (1,3).

Again, (1,4) (3,4).

Let's drop the duplicates and voila (1,2) (1,3) (1,4) (3,4).

Let's create a graph frame named D using these data duplicated edges data frame.

As source is less than destination for each edge,

then in the new data frame there will be only one warrant of each triangle.

Where the first vertex AD is smaller than the second,

and the second is smaller than the third.

Why? Let's call the first vertex A,

the second vertex B and here comes the surprise,

the third vertex C. If A,

B and C form a triangle,

then there should be edges between A to B,

B to C and C to A.

If there is an edge from A to B that means,

that A is less than B.

If there is an edge from B to C,

then B is less than C. And then,

consequently, A is less than C,

which means that, if there is an edge between A and C,

then it can go only from A to C. Let's formulate

these in DSL language of structural patterns.

So, now, using triangles count method of graph frames,

you will find all triangles using motif finding method

which was described in detail in the previous lesson.

Note that each triangle will be found just once.

After you have found all the triangles,

you should count the amount of times each vertex was mentioned in all the triangles.

You can do it by exploding the triangles data frame grouping

by new col id and by counting the amount of appearances of each id.

Let's call the data frame you received after exposing triangles as a vertex triangles.

The final step will be to join vertex triangles with original data frame g.vertices.

On condition, vertex triangles.id is equal g.vertices.id.

Triangles count algorithm summing up.

First, flip all edges in a way where

source is less than destination and delete all duplicates.

Second, find all the triangles with the motif using pattern.

Further explode triangles and count occurrences of each vertex.

Fourth, joining info about triangles for each vertex with it's original info about it.

So, let's count the amount of shuffles of data

that triangle count method will do with the data from

the original graph frame G. On the first step,

you won't do any of them.

On the second, motif finding will do six shuffles.

We counted them in assessment in lesson two.

On first step, one more shuffle will be done during grouping by ID.

And finally, one more shuffle will be done as the last step.

Joining vertex triangles with original g.vertices data frame.

To sum up, you have learned how

the triangle count method of graph frames works step by step.

And now you know how to estimate the amount of shuffles that will be done during it.