When we have an application,

we need to not only choose a vector space,

but we also have to choose a distance function in the vector space.

So the question essentially becomes,

what does a good distance measure?

How do I select this distance measure.

As I said earlier the distance measure is application dependent.

Depending on the application L1 distance might be appropriate,

L2 distance might be appropriate or L infinity distance might be appropriate.

It's hard to tell it ahead of the time

without actually looking at the properties of the application.

However, there are certain properties of distance functions that if they are satisfied,

they make our job easier are life easier.

Some of these properties are called metric distance properties.

And they are essentially four properties together make this metric distance properties.

self-minimality, minimality, symmetry, and triangular inequality.

Let's go down one by one.

Let's start from self-minimality.

Self-minimality means, give me a distance function,

and give me a vector in the vector space.

If I measure the distance in the on

the vector space from the object to itself from one vector to itself,

the distance should be zero.

Now, this makes sense.

Right? Essentially this means defining and comparing an object to itself,

there shouldn't be any difference.

The system should tell me this object is equal to itself.

Now it turns out that,

there are some situations in which this is how to achieve.

Again we are not going to get into that right now.

We might talk about that later.

But essentially, self-minimality is

a desired situation which should ensure that given object matches itself perfectly.

But again, there might be situations in which that it is violated.

The second requirement from metric distances is minimality.

Minimality means, give me two vectors that are not the same,

vector S1 and vector S2.

If I measure the distance between them,

that distance should be greater than or equal to the distance of the S1 vector to itself.

If self-minimality holds, we know that this is equal to zero.

This means essentially is that if I'm comparing two distinct objects,

they will have some zero or larger value.

Essentially what this means is that,

if the minimality condition holds,

an object matches itself much better than it matches in a draw object in the database,

a second property that we like to ensure if possible.

The third property that we often seek from distance functions is called symmetry.

Symmetry means the following.

If I measure the distance from vector S1 to S2,

or if I measure the distance from vector S2 to S1,

I should get the same value.

Now again this sounds intuitive.

If I compare one object to the others,

versus this object to the to the others,

I should get the same value.

I have two objects, I measure the distances,

It shouldn't matter which one I'm comparing to what.

If this can be ensured,

then I get some kind of consistency from the system.

If I am basically using object S1 is my query,

object S2 is returned,

and if I use object S2 as the query,

I will get object S1 from the system as well.

I get consistency. Now again,

it turns out that the symmetry is often violated.

So there are many cases in which

the distance functions that basically we use do not satisfy the symmetry properties.

So we basically lose consistency.

But we gain some other things.

Again we will discuss them when the time comes.

But keep it in mind symmetry is a strong desired property of distance functions.

The fourth requirement from distance functions is called triangular inequality.

So, as we discussed,

symmetry gives us consistency,

triangular inequality gives us efficiency.