Data repositories in which cases are related to subcases are identified as hierarchical. This course covers the representation schemes of hierarchies and algorithms that enable analysis of hierarchical data, as well as provides opportunities to apply several methods of analysis.

Associate Professor at Arizona State University in the School of Computing, Informatics & Decision Systems Engineering and Director of the Center for Accelerating Operational Efficiency School of Computing, Informatics & Decision Systems Engineering

K. Selcuk Candan

Professor of Computer Science and Engineering Director of ASU’s Center for Assured and Scalable Data Engineering (CASCADE)

A second very commonly used measure

to quantify similarity or distance between time series is the correlation similarity.

Correlation similarity is defined as follows.

What's correlation? Let's remember first of all.

Correlation essentially is a way to measure

if two functions or if

two variables have similar increase or decrease patterns.

So in this slide,

we have three different function pairs or three different series pairs.

In the first example,

we have observations that are

recorded for the same series

that are increasing similarly.

That is, if one of the series have a high value,

the second series also have a similarly high value.

This essentially would mean that either the two series are

increasing similarly or they are decreasing similarly.

So basically, they have what we call a positive correlation.

So, if you go back to the original example, after 2011,

I will see that the keyword "Machine Learning"

and keyword "Deep Learning," both of them are increasing similarly,

simultaneously, which essentially would mean that they are positively correlated.

They have a positively correlated behavior.

In the second example here,

we see negatively correlated time series.

In this case, if you take a look into that,

what is happening is that,

let's take a look at these two entries here,

what's happening is that when series one have a higher value,

series two is getting a lower value.

Which means that basically,

when one series increasing,

the other series a decreasing,

or when the first series is increasing,

the second series is decreasing.

It means that these two series behave very differently from each other.

So, if you go back to the example here,

and if you basically look at the time frame,

say between 2009 and 2011,

we will see that basically,

the keyword "Machine Learning" still has a sort of somewhat decreasing behavior,

whereas the keyword "Big Data" is increasing at that time.

In that time frame,

these two keywords have negatively correlated behavior.

When one is increasing,

the other one is decreasing.

Usually, when two time series are negatively correlated,

they essentially are recording opposite things.

We also have uncorrelated time series.

In the case of uncorrelated time series,

we don't have a strong relationship one way or the other.

So basically, it may be that in these two case,

we might see if a variable is increasing,

the other one is also increasing.

But you might also see the opposite.

We might see that basically,

between these two series,

when one of the variable is increasing,

the value of the other variable is decreasing.

So, these are essentially basic uncorrelated time series,

time series where when one of them have a high value,

we cannot necessarily predict whether the other one will have a high value or low value.

Uncorrelated time series. So, essentially,

this shows us that when we are comparing time series,

we have different ways to quantify their similarity or difference.

The way we're going to select whether

which one or the other one to use depends on the application.

In the case of Euclidean distance,

we measure how close the two time series

are to each other but we don't necessarily basically see whether if one is increasing,

the other one is decreasing or vice versa.

We cannot tell that. We can only say,

these two time series are similar to each

other in terms of their amplitudes or they are far apart from each other.

This the only thing that we can say.

If you look at the correlation, however,

correlation can tell us if one of them is

increasing whether the other one is decreasing or not.

So, we can actually get some more information about

the sort of overall patterns but in that case,

you may not be able to say in absolute terms whether they are close to each other or not.

So, to decide basically one or the other,

we need to look at the application and we need to say are we comparing them in

terms of their absolute values or are we trying to understand whether when is decreasing,

the other was increasing or not.

So we need to basically ask ourselves,

what is the application and we need to decide about

the distance or similarity function based on the application that is given to us.