All Title Author
Keywords Abstract


Cluster Analysis Based on Contextual Features Extraction for Conversational Corpus

DOI: 10.4236/jcc.2015.35004, PP. 33-37

Keywords: Conversational Corpus, Contextual Features, VSM, SOM

Full-Text   Cite this paper   Add to My Lib

Abstract:

Cluster analysis related to computational linguistics seldom concerned with Pragmatics level. Features of corpus on Pragmatics level related to specific situations, including backgrounds, titles and habits. To improve the accuracy of clustering for conversations collected from international students in Tsinghua University, it required contextual features. Here, we collected four-hundred conversations as a corpus and built it to Vector Space Model. With the Oxford-Duden Dictionary and other methods we modified the model and concluded into three groups. We testified our hypothesis through self-organizing map neural network. The result suggested that the modified model had a better outcome.

References

[1]  Jurafsky and Martin (2000) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall.
[2]  Lewis, D.D. and Hayes, P.J. (1994) ACM Trans-actions on Information Systems: Special Issue on Text Categorization, Vol. 12. ACM Press.
[3]  Ji, H., Luo, Z.S., Wang, M. and Gao, X.Y. (2002) Summarizing Based on Concept Counting and Hierarchy Analysis. The Natural Language Processing and Knowledge Engineering (NLPKE) Mini Symposium of the 2002 IEEE International Conference on Systems, Man and Cybernetics (SMC2002).
[4]  Liao, S.S. and Jiang, M.H. (2005) An Improved Method of Feature Selection Based on Concept Attributes in Text Classification. Advances in Natural Computation, Lecture Notes in Computer Science, 3610, 1140-1149. http://dx.doi.org/10.1007/11539087_152
[5]  Kohonen, T. (1987) Self-Organization and Associative Memory. 2nd Edition, Springer-Verlag, Berlin.
[6]  Salton, G., Singhal, A., Buckley, C., et al. (1994) Automatic Text Decomposition Using Text Segments and Text Themes. Text Retrieval Conference, Washington DC.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

微信:OALib Journal