So, just remember that in the clustering we’ve looked at so far.

We've looked at methods where the index of a given data point,

plays no role in the resulting clustering of that data point.

So, that is, we could simply permute all our data indices, and

cluster that permuted data, and we would get out exactly the same results.

But what if the order of the data points actually mattered, like in time series

where the label, in particular the time stamp associated with each data point,

is really critical to the analysis of that data.

So for example here, we're looking at just a uni varied time series so

the value of the observation is along the y axis.

And the time stamp of the observation is along the x axis.

And here, maybe our goal might be to

parse this time series into what I'll call different dynamic states.

So we can think of this just as just different clusters

appearing in the data set over time.

So maybe there's a green cluster, a blue cluster, and a red cluster and

this time series is switching between each of these different states.

Well if we ignored the time stamp associated with these data points, so

think about just taking this plot and scrunching it along the x axis.

So ignoring the time stamp and just looking at points along a one dimensional

y axis and tried to cluster those observations, we have a really hard time.

Because the range of values that the time series

it takes when it's in this green state overlapped quite a lot.

With the set of values that the time series it takes when it's in the blue and

the red states.

So all these values would be very overlapped and

without tons and tons of data, it would be really hard

to distinguish the fact that there are three different clusters here.

But, instead, the structure of this data across time can actually help us

uncover this pattern which appears rather, immediately,

obvious from looking at this plot.

In particular, we have the fact that in this data set if we're currently in

a green state it seems like we're very likely to keep being in a green state.

And then maybe there are certain other transitions between states, between

green and blue or blue and red that might be more likely than other transitions.

And this type of information can help us

extract the clustering that we're desiring from this time series data.

So in the context of time series, we can think of this as a segmentation task.

To make this more concrete, let's look at the dance of a honeybee.

Because when a honeybee is in the beehive, it switches between three different dances

in order to communicate the location of food sources to other bees in the hive.

So it switches between this waggle dance, a turn right dance and a turn left dance.

And it keeps repeating specific patterns of these dances to communicate

with the other bees.

And so here, what we're showing are three different dance sequences that

were segmented into the waggle dance, which is the red color, turn right,

which is the green color, and turn left, which is the blue color.

But, the question is, can we think of automatically extracting this information,

just from observations of the honeybee's body position and

head angle as it's moving around in this hive.

So in particular when we’re looking at these dances we can make a plot

of what dance the honey bee is in at every time step.

And this will indicate the persistence in certain dances and

the switches between them.

Between this waggle, turn right and turn left dance.

And what we see from this data is that there are indeed patterns.

We see persistence in the different dances.

Because if you're currently doing one type of dance,

you're likely to continue doing that dance.

And then we also see certain transitions are more likely than others.

So for example, if I'm currently in a red dance,

it's very likely I'll go to a green dance, or from a green dance to a red dance,

though obviously other transitions are possible as well.

And this type of notion of switching between different dynamic states appears

in lots of different applications.

Luckily not just the study of honey bees.

So for example maybe we have conference audio if

people taking turns speaking in a conference meeting.

And we want to be able to segment that audio

into who is speaking when during the meeting.

Or maybe we have observations of a person doing a set of exercise routines.

And we want to be able to segment out, okay this person is dong jumping jacks or

running and squats and so on.

And typically when people do different exercise routines,

they switch between a set of behaviors again, and again, and again.

And likewise when we're looking at stock data,

this is a really common type of approach for thinking about this data.

Where maybe the stock indices are switching

between regimes of high volatility or low volatility.

Or a medium volatility, or obviously different levels of volatility.

So, given the broad set of applications where we see this type of structure.

Let's spend a little bit of time talking about a model

that can allow us to extract this type of information from data.

And this model is called a Hidden Markov model, or an HMM for short.

And an HMM is very, very,

very similar to the type of mixture models we described earlier in this course.

So just like in a mixture model,

every observation is associated with a cluster indicator.

So here we're referring to things as clusters.

When you talk about HMMs often that cluster is described as a state.

We talked about these dynamic states so we'll use the two interchangeably here

just to draw connections with the mixture models that we described before.

Okay, but the point is every observation has a cluster assignment

associated with it and that's unknown to us, all we get is the observed value.

And then, also, just like in our mixture model, every cluster or

state has a distribution over observed values.

So remember in our mixture model, we said that each component would maybe be

a Gaussian with defining a distribution over the blue intensity unit and image.

And that would be different if we're thinking about images about clouds,

versus sunsets versus forests.

And so those would be the different clusters present.

Well here we have different dynamic states, but

given a specific assignment to one of these states or clusters.

Then we have the same type of distribution over the observed value within that state.

But the really critical difference Is the fact that the probability

of a given cluster assignment, depends on the value

of the cluster assignment for the previous observation.

And this is how we're going to capture the time dependency.

This is how we're going to capture things like the fact that if we're currently

in a state,

like the waggle dance we're more likely to be in that state at the next time step.

Or maybe if we're currently in this red state we're more likely to

be in the green state at the next time step then a transition to the blue state.

And so

it's through this structure of dependency in the cluster of assignment variables

across time that we're able to extract this type of time dependent clustering.