OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2007

一种有限混合模型对无监督文本聚类的广义方法*

, PP. 698-703

张亮,李敏强

Keywords: 有限混合,无监督学习,文本聚类,特征选择,模型选择,期望-最大化算法

Full-Text Cite this paper Add to My Lib

Abstract:

提出一种有限混合模型对无监督文本聚类的广义方法.它将特征对各混合成员的相关性作为隐变量引入混合模型，在一个统一框架中完成混合模型的模型选择、特征选择以及参数估计.在大规模文本数据集上的实验结果表明该方法在模型选择、特征选择和聚类结果3个方面都取得较好效果.

References

[1]	Liu Xin, Gong Yihong, Xu Wei, et al. Document Clustering with Cluster Refinement and Model Selection Capabilities // Proc of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland, 2002: 191-198
[2]	Nigam K, McCallum A K, Thrun S, et al. Text Classification from Labeled and Unlabeled Documents Using EM. Machine Learning, 2000, 39(2/3): 103-134
[3]	Yang Y, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization // Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 412-420
[4]	Law M H C, Figueiredo M A T, Jain A K. Simultaneous Feature Selection and Clustering Using Mixture Models. IEEE Trans on Pattern Analysis and Machine Intelligence, 2004, 26(9): 1154-1166
[5]	Schwarz G. Estimating the Dimension of a Model. Annals of Statistics, 1978, 6(2): 461-464
[6]	Akaike H. A New Look at the Statistical Model Identification. IEEE Trans on Automatic Control, 1974, 19(6): 716-723
[7]	Dempster A P, Laird N M, Rubin D B. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B, 1977, 39(1): 1-38
[8]	Biernacki C, Celeux G, Govaert G. Strategies for Getting the Highest Likelihood in Mixture Models [EB/OL]. [20010920]. http: //inria.ccsd.cnrs.fr/view_by_stamp.php?label=INRIA-RRRT&langue-en&action_todo=view&id-inria-0072333&version=1#
[9]	van Rijsbergen C J. Information Retrieval. London, UK: Butterworths, 1979
[10]	Strehl A, Ghosh J. Cluster Ensembles－A Knowledge Reuse Framework for Combining Partitions. Journal of Machine Learning Research, 2002, 3(3): 583-617
[11]	Ng A Y, Jordan M I, Weiss Y. On Spectral Clustering: Analysis and an Algorithm // Dietterich T G, Becker S, Ghahramani Z, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2001, 14: 849-856
[12]	Schapire R E, Freund Y, Bartlett P, et al. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Annals of Statistics, 1998, 26(5):1651-1686

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133