One simple yet extraordinarily class of probabilistic temporal models is the

class of hidden Markov models. Although these are models can be viewed

as a subclass of dynamic Bayesian networks.

We'll see that they have their own type of structure that makes them particularly

useful for a broad range of applications. So, a hidden Markov model in its simplest

form can be viewed as a probabilistic model that has a state variable S and a

single observation variable O. And so the model really has only two

publistic pieces, there is the transition model that tells us the transition from

one state to the next over time and then there is the observation model, that

tells us in a given state how likely we are to see different observations.

Now, you can unroll this simple 2d-BN. So if this is our 2d-BN you can unroll

that to produce an unrolled network, which has the same repeated structure

[INAUDIBLE] state at time zero move to the state at time one, and so on,

and at each state, we make an appropriate observation.

But what's interesting about hidden Markov models is that they often have a

lot of internal structure that manifests most notably here in the transition

model, but sometimes that was on the observation

model. So here is an example of what the of what

a structured transition model would look like and this entire model is actually

an, is peering into the CPD of the probability of the state a time, of the

next state given, the current state. So each of these nodes in here is not a

random variable, rather it is a particular assignment to the state

variable, sort of state that the model might been.

And what you see here is the structure of the transition matrix that tells us that

from S1 for example, the the model is likely to transition to S2 with

probability of 0.7 or stay in S1 with a probability of 0.3. And so the,

these two outgoing probabilities have to sum to one because it's a probability

distribution over the next state, given that in the current time point the model

is in state S1. And we similarly have that for, all other

states, so here from S4, for example.

there is the probability of 0.9 of going to S2 and 0.1 of staying at S4.

So here, the structure is actually a sparsity in the transition model,

As opposed to something that manifests at the level of the 2TBN structure, which is

actually fairly simple. It turns out that this kind of structure

is useful for a broad range of applications.

robot localization, speech recognition, where HMMs are really the method of

choice for all current speech recognition systems.

Biological sequence analysis for example, annotating a DNA sequence with elements

that are functionally important and other elements that are not of importance and

similarly annotating a text sequence with, particular annotating words with

the role in the sentence for example. All of these are methods where hidden

Markov models have been used with great success.

So, let's look for example what the HMM would look like for robot localization.

This might not look exactly like a HMM to begin with because it has some additional

variables, so lets talk about what each of these

variables means. Here, what we have is a state variable

that represents the robot pose that is the position, and potentially orientation

of the robot within a map at each point in time.

We also have an external control signal u, which is the guidance that the robot

is told of move left, move right, turn around, since these variables are

observed and externally imposed, they're not really stochastic random variables,

they are just inputs to the system if you will.

and then we have in addition, the observation variables which is what the

robot observes at each point in the process, which depends on both their

position, and of course on, the map position. So that the robot's

observations depend on the overall architecture of the space that they're in

and the state that they're building. Now, since in many of the applications

that we'll be considering the map is observed, which is why I just grayed it

out then you can effectively think of this as something where the basic

structure, over which where reasoning is just the

set of variables that represent the S's and the O's,

which is why this is really a slight elaboration of the standard HMM model.

And here also, we're going to have sparsity in the transition model, because

the robot had jump around from one state to the other so within a single step.

and so there is only going to be a limited set of positions at time T plus

one given where the robot is at time T. Speech recognition, as I mentioned, is

perhaps the prototypical HMM success story,

because it's so, it's so much in common use.

The speech recognition problem is to take a sentence, such as Dorothy lived in

whatever, and imagine somebody is saying that and

what is given as input to the probabilistic reasoning system is this

very noisy acoustic signal that represents the, the values of the

different frequencies of speech in a different frequency, acoustic frequencies

at each point in time. And what we would like to do, is we would

like to take that input and put it into a decoder

that is going to evaluate different possible sentences,

and hopefully guess what the source sentence is and be able to predict it

with reasonable accuracy. So how does speech recognition work?

So this is what an acoustic signal looks like,

we can see that over here where at each point in time we have these frequencies

and we can convert that to the frequency spectrum by using Fourier transform.

And what we would like to do, we would like to break up this continuous signal

into these pieces that correspond to words and recognize for each piece that

which word it corresponds to. So this is a twofold problem because in general

in continuous speech, we have to both identify the boundaries between words, as

we are also trying to identify the words. And it turns out that the way to do this

is not to think about words as the basic unit,

but rather think about these smaller entities that are called phones or

phonemes, as the basic units from which words are comprised,

and then, as we recognize phones, we can put them together to figure out what the

words mean. So, here is a phonetic alphabet one of

several that break the is used to define how words are broken up into these little

pieces. And, so you can see that these hm, hm

don't line up exactly with with characters in the alphabet,

because the same character can actually be pronounced in different ways giving

rise to different phones, and vice versa sometimes the same phone can come from

different letters. And, so what we see here are the,

for example the phone called ER, or called ER and is pronounced in the same

way as the ur in hurt. And, so this is a phonetic alphabet, and

as I said, there is many of those that one can consider.

So once we have the phonetic alphabet we can

look at a word, in this case, this is the, this is the

phonetic alphabet, the phonetic breakdown of the word nine, n, ay, n.

So you can see that this is an HMm structure,

this is not the DBN, this is the HMM that tell us at each point in time, whether

you're within the phone, n, ay, and so, there is a self transition loop, because

you can stay in the same phone for more than one time unit.

And then, eventually, you transition to the next phone and the next phone and

this is a typical HMM for a word. Now, within a phone, a phone also lasts a

certain unit of time, and so what we have here is the, within the phone for a

given, within a given phone there is a transition model.

In this case, the phone and it has in this case, three states,

the beginning b, the middle, and the final.

And this is a fairly standard breakdown for most phone that have the beginning of

the phone, the middle, and the end. And each of these typically corresponds

to a different distribution over the acoustic signal that you see at that

stage in the phone. So, if you put all these together, if

you're trying to do speech recognition, then and this is for the moment, speech

recognition for isolated words. So there is a start state,

and then, there is a transition model from the start state that tells us how

likely we are in the current state to go into one of many words and that would be

a language model that tells us how likely different words are to occur at this

point. And then, assuming, and then given that

the model has transitioned to say the one word, the word one, and we can see that

we now have across different points in time that the model transitions to the w,

w and then ultimately to the and then finally to the n, n, n.

And, then, at the end of it, it transitions to the end of the word, at

which point the process circles back and we move on to the next word.

And the self transition, and the transition back to the start state is

what allows us to do continuous speech recognition.

So this is a probabilistic model that tells us how speech might be constructed

of first words, then phones within words, and then finally pieces, little bits of

the phone that we see in this in the phone HMM that we saw previously.

And, this is a generative model of speech,

but what happens is that as you feed in evidence about the observed acoustic

signal over here, and you run probabilistic inference over

this model. What you get out is the most likely set

of words that gave rise to the observed speech signal.

So to summarize, HMMs in some simplistic way can be viewed

as a subclass of the framework of dynamic Bayesian networks.

And, while they seem unstructured when we observe them at that level, when we think

of structure at the level of random variables, there is a lot of structure

that manifests in the sparsity structure of the, of the conditional probabilities

and also in terms of repeated elements. As we saw in the previous slide,

the phoning, the phone model can occur in multiple words and we replicate that

structure across the different places where the same phone can be used in, in

the language. And so, that gives a lot of structure in

the transition model that really doesn't manifest in any way at the level of

random variables. And, we've seen that HMMs are used in, of

wide variety of applications for sequence modeling and they're probably one of the

most used forms of probabilistic graphical models out there.