And this is just a net sum of all the counts.

And this further allows us to then solve the optimization problem,

eventually, to find the optimal setting for theta sub i.

And if you look at this formula it turns out that it's actually very intuitive

because this is just the normalized count of these words by the document ns,

which is also a sum of all the counts of words in the document.

So, after all this mess, after all,

we have just obtained something that's very intuitive and

this will be just our intuition where we want to

maximize the data by assigning as much probability

mass as possible to all the observed the words here.

And you might also notice that this is the general result of maximum likelihood

raised estimator.

In general, the estimator would be to normalize counts and it's just sometimes

the counts have to be done in a particular way, as you will also see later.

So this is basically an analytical solution to our optimization problem.

In general though, when the likelihood function is very complicated, we're not

going to be able to solve the optimization problem by having a closed form formula.

Instead we have to use some numerical algorithms and

we're going to see such cases later, also.

So if you imagine what would we get if we use such a maximum

likelihood estimator to estimate one topic for a single document d here?

Let's imagine this document is a text mining paper.

Now, what you might see is something that looks like this.

On the top, you will see the high probability words tend to be those very

common words, often functional words in English.

And this will be followed by some content words that really

characterize the topic well like text, mining, etc.

And then in the end, you also see there is more probability of

words that are not really related to the topic but

they might be extraneously mentioned in the document.

As a topic representation, you will see this is not ideal, right?

That because the high probability words are functional words,

they are not really characterizing the topic.

So my question is how can we get rid of such common words?