Okay. So, this is what

a classification problem might

actually look like in practice.

We have some data, we're looking to classify

belonging to in the simplest case at least,

one of two classes, those are

positive and negative class.

So this scatter plot represents some data I might

have collected which has two features,

and each data point has an associated classification.

So, for example, this could be whatever you like,

but maybe you have features

representing the height and

weight of different individuals,

and your prediction you are trying to make is whether

an individual is male or female.

So those would be a two classes,

one of which is positive or negative.

But, it could be anything,

really we just have data where we have a binary class,

and we have two features we're trying to use to

predict that class which we call

positive and negative examples.

Here, the examples are drawn as

stars for negatives and

I think they pep is for positives.

But you can draw it however you like.

In other words, we have some data which

has two different classes and we'd

like to have some function that give

a new observation estimate,

which of the two classes that point should belong to.

So, what's the simplest algorithm we can

come up with to classify a new data point?

So, suppose we observe some point here denoted as X,

and we'd like to

estimate whether it's positive or negative,

or we observe about those data points

is those two-dimensional features associated with it.

So, nearest neighbor is

a very simple classification

scheme we can use to solve this.

It says, well, this data point probably has

the same category or class

as other data points with similar features.

So to find the data point with the most similar features,

let's just compute the distance

of this data point or more

precisely this data point's

features all other data points.

We'll find the nearest one,

and the prediction we make says,

this data point should have the same category or

the same label as

the nearest data point in our training set.

So in this case, nearest point has negative label,

and we will label this data point as negative.