全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一种并行LDA主题模型建立方法研究

Keywords: MapReduce架构,并行计算,潜在狄利克雷分布模型,主题建模

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对潜在狄利克雷分析(LDA)模型分析大规模文档集或语料库中潜藏的主题信息计算时间较长问题,提出基于MapReduce架构的并行LDA主题模型建立方法.利用分布式编程模型研究了LDA主题模型建立方法的并行化实现.通过Hadoop并行计算平台进行实验的结果表明,该方法在处理大规模文本时,能获得接近线性的加速比,对主题模型的建立效果也有提高.

References

[1]  Fabrizio Sebastiani, Alessandro Zanasi. Text categorization[C]//Proceedings of Text Mining and Its Applications. Southampton, UK: WIT Press, 2005: 109-129.
[2]  Blei D, Ng A,Jordan M. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[3]  Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM: 50th Anniversary Issue, 2008,51(1):107-113.
[4]  李文波,孙乐,张大鲲.基于labeled-LDA模型的文本分类新算法[J].计算机学报,2008,31(4):620-627. Li Wenbo, Sun Le, Zhang Dakun. Text classification based on labeled-LDA model[J]. Chinese Journal of Computers, 2008,31(4):620-627. (in Chinese)
[5]  石晶,范猛,李万龙.基于LDA模型的主题分析[J].自动化学报,2009,35(12):1586-1592. Shi Jing,Fang Meng,Li Wanlong. Topic nalysis based on LDA model[J]. Acta Automatica Sinica,2009,35(12):1586-1592. (in Chinese)
[6]  中国科学院计算技术研究所.ICTCLAS汉语分词系统[EB/OL].[2011-05-12]. http://ictclas.org/. The Institute of Computing Technology of the Chinese Academy of Sciences. ICTCLAS Chinese word segmentation system[EB/OL].[2011-05-12]. http://ictclas.org/. (in Chinese)
[7]  Apache Hadoop. Welcome to apache hadoop[EB/OL].[2011-06-20]. http://hadoop.apache.org/.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133