So this would be, then, our estimate of the parameters.

And these parameters,

note that are precisely what we hoped to discover from text data.

So we'd treat these parameters as actually the outcome or

the output of the data mining algorithm.

So this is the general idea of using

a generative model for text mining.

First, we design a model with some parameter values to fit

the data as well as we can.

After we have fit the data, we will recover some parameter value.

We will use the specific parameter value And

those would be the output of the algorithm.

And we'll treat those as actually the discovered knowledge from text data.

By varying the model of course we can discover different knowledge.

So to summarize, we introduced a new way of representing topic,

namely representing as word distribution and this has the advantage of using

multiple words to describe a complicated topic.It also allow us to assign

weights on words so we have more than several variations of semantics.

We talked about the task of topic mining, and answers.

When we define a topic as distribution.

So the importer is a clashing of text articles and a number of topics and

a vocabulary set and the output is a set of topics.

Each is a word distribution and

also the coverage of all the topics in each document.

And these are formally represented by theta i's and pi i's.

And we have two constraints here for these parameters.

The first is the constraints on the worded distributions.

In each worded distribution the probability of all the words

must sum to 1, all the words in the vocabulary.

The second constraint is on the topic coverage in each document.

A document is not allowed to recover a topic outside of the set of topics that

we are discovering.

So, the coverage of each of these k topics would sum to one for a document.

We also introduce a general idea of using a generative model for text mining.

And the idea here is, first we're design a model to model the generation of data.

We simply assume that they are generative in this way.

And inside the model we embed some parameters that we're interested in

denoted by lambda.