OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

电子学报 2012

基于概率主题模型的文档聚类

DOI: 10.3969/j.issn.0372-2112.2012.11.033, PP. 2346-2350

王李冬,魏宝刚,袁杰

Keywords: 主题模型,LDA模型,TC_LDA模型,文档聚类

Full-Text Cite this paper Add to My Lib

Abstract:

为了实现普通文本语料库和数字图书语料库的有效聚类,分别提出基于传统LDA(LatentDirichletAllocation)模型和TC_LDA模型的聚类算法.TC_LDA模型在LDA模型基础上进行扩展,通过对图书文档的目录和正文信息联合进行主题建模.和传统方法不同,基于主题模型的聚类算法能将具备同一主题的文档聚为一类.实验结果表明从主题分析角度出发实现的聚类算法优于传统的聚类算法.

References

[1]	刘铭,王晓龙,刘远超.基于语义的高维数据聚类技术[J].电子学报,2009,37(5):925-929. Liu Ming,Wang Xiao-long,Liu Yuan-chao.Clustering technology for high dimensional data based on semantics[J].Acta Electronica Sinica,2009,37(5):925-929.(in Chinese)
[2]	Wang X,et al.Topical N-grams:Phrase and topic discovery,with an application to information retrieval .Proc of the 7th IEEE International Conference on Data Mining .Omaha,Nebraska,USA,2007.697-702.
[3]	Frey B J,Dueck D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[4]	Blei D M,et al.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
[5]	Newman D,Noh Y,Tally E.Evaluating topic models for digital libraries .Proc of JCDL .Gold Coast,Queensland,Australia,2010.215-224.
[6]	Shehata S,et al.An efficient concept-based mining model for enhancing text clustering[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1360-1371.
[7]	Timothy N R,et al.Statistical topic models for multi-label document classification[J].Machine Learning,2012,88(1-2):157-208.
[8]	Andrzejewski D,Buttler D. Latent topic feedback for information retrieval .Proceedings of 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD) .New York:ACM press,2011.600-608.
[9]	Heinrich G.Parameter estimation for text analysis .http://www.arbylon.net/publications/text-est.pdf,2005.
[10]	Ramage D,Heymann P.Clustering the tagged web .Proc of the Second ACM International Conference on Web Search and Data Mining .Barcelona,Spain,2009.54-63.
[11]	曹娟,张勇东,李锦涛,唐胜.一种基于密度的自适应最优LDA模型选择方法[J].计算机学报,2008,31(10):1780-1786. Cao Juan,Zhang Yong-dong,Li Jin-tao,Tang Sheng.A method of adaptively selecting best LDA model based on density[J].Chinese Journal of Computer,2008,31(10):1780-1787.(in Chinese)

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133