全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Clusters Merging Method for Short Texts Clustering

DOI: 10.4236/jss.2014.29032, PP. 186-192

Keywords: Short Texts Clustering, Slide Window, Information Gain, Hierarchical Clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

Under push of Mobile Internet, new social media such as microblog, we chat, question answering systems are constantly emerging. They produce huge amounts of short texts which bring forward new challenges to text clustering. In response to the features of large amount and dynamic growth of short texts, a two-stage clustering method was putted forward. This method adopted a sliding window sliding on the flow of short texts. Inside the slide window, hierarchical clustering method was used, and between the slide windows, clusters merging method based on information gain was adopted. Experiment indicated that this method is fast and has a higher accuracy.

References

[1]  He, H., Chen, B., Xu, W., et al. (2007) Short Text Feature Extraction and Clustering for Web Topic Mining. IEEE Third International Conference on Semantics, Knowledge and Grid, 382-385.
[2]  Hartigan, J.A. and Wong, M.A. (1979) Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28, 100-108.
[3]  Szekely, G.J. and Rizzo, M.L. (2005) Hierarchical Clustering via Joint between-within Distances: Extending Ward’s Minimum Variance Method. Journal of Classification, 22, 151-183. http://dx.doi.org/10.1007/s00357-005-0012-9
[4]  Zhao, P. and Cai, Q.S. (2007) Research of Novel Chinese Text Clustering Algorithm Based on HowNet. Computer Engineering and Applications, 43, 162-163.
[5]  Tang, J., Wang, X., Gao, H., et al. (2012) Enriching Short Text Representation in Microblog for Clustering. Frontiers of Computer Science, 6, 88-101.
[6]  Wang, L., Jia, Y., Han, W. (2007) Instant Message Clustering Based on Extended Vector Space Model. Advances in Computation and Intelligence, Springer Berlin Heidelberg, 435-443. http://dx.doi.org/10.1007/978-3-540-74581-5_48
[7]  Peng, Z.Y., Yu, X.M., Xu H.B., et al. (2011) Incomplete Clustering for Large Scale Short Texts. Journal of Chinese Information, 25, 54-59.
[8]  Chen, J.C., Hu, G.W., Yang, Z.H., et al. (2011) Text Clustering Based on Global Center-Determination. Computer Engineering and Applications, 47, 147-150.
[9]  Liu, Z.X., Liu, Y.B. and Luo, L.M. (2010) An Efficient Density and Grid Based Clustering Algorithm. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 22, 242-247.
[10]  Quinlan, J.R. (1979) Discovering Rules by Induction from Large Collections of Examples. Expert Sys-tems in the Micro Electronic Age. Edinburgh University Press.
[11]  Guha, S., Rastogi, R. and Shim, K. (1998) CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Record, ACM, 27, 73-84.
[12]  Zhou, Z.T. (2005) Quality Evaluation of Text Clustering Results and Investigation on Text Representation. Graduate University of Chinese Academy of Sciences, Beijing.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133