There are two majority labels,

in the case like we have two Ns,

two Ys, then we just select randomly.

Otherwise, we always choose the majority.

Now, we've finished the explanation of k-Nearest Neighbor,

but I would like to mention that

k-Nearest Neighbor is also called the Lazy Learning Algorithm.

Why is it called the lazy learning?

Think. If we look at the algorithm,

we will find that this algorithm actually does not learn at all, right?

It only starts classifying an instance during the test phase.

Okay? So, it is called a lazy learning.

So it does not learn, but what does it do then?

It just compares a new instance without class label with

all the instances in your dataset with known classes.

But in this case,

we would ask how fast is k-Nearest Neighbor?

You see because it's a lazy learning algorithm,

it does not learn, is it still fast during testing phase?

As a matter of fact, it is not.

It can be very slow, because when the data is big,

we'd have a big data, right?

When data is big,

it has to go through all the instances and calculate their distance

because we never know what the new instance is.

So, we have to calculate on the fly.

So we need to figure out a way to speed up this k-Nearest Neighbor.

One way is, I already used this in this dataset as an example,

if there are six clusters,

we just use cluster representatives as the instances.

Instead of let's say,

each cluster has 10 or 20 data points,

if 20 data points,

I have a 120 data points here.

But if I just use the representatives of these clusters,

I only have six cluster data points.

Let's say one here, one here,

one here, one here. These are the six.

Maybe now I can do it much faster than 120.

It's six versus 120,

each cluster, instances, a cluster.

So this is a one way, but we can also do it the hierarchical way.

So if it's really big,

we represent using these representatives,

then after we decide this is the cluster,

then we zoom in and compare because it's so pure here in this case.

In reality, it's not that pure.

Then we can zoom in and do the k-Nearest Neighbor as we've explained before.