Publish in OALib Journal
APC: Only $99
contribution deals with a generative approach for the analysis of textual data.
Instead of creating heuristic rules forthe representation
of documents and word counts, we employ a distribution able to model words
along texts considering different topics. In this regard, following Minka
proposal (2003), we implement a Dirichlet Compound Multinomial (DCM) distribution, then we propose an
extension called sbDCM that takes
explicitly into account the different latent topics that compound the document.
We follow two alternative approaches: on one hand the topics can be unknown,
thus to be estimated on the basis of the data, on the other hand topics are
determined in advance on the basis of a predefined ontological schema. The two
possible approaches are assessed on the basis of real data.