全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一种面向微博主题挖掘的改进LDA模型

, PP. 93-101

Keywords: 新浪微博,文本挖掘,RT-LDA,吉布斯抽样

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着新浪微博用户的不断增长,微博网站成为很多人获取信息的平台.但是微博是一种特殊的文本,其字数受到严格限制,传统的主题模型并不能很好地分析微博的内容.本文提出了一个基于LDA的微博生成模型RT-LDA来解决微博字数受限的问题.模型采用吉布斯抽样法来推导,不仅能准确地挖掘每条微博的主题,还能归纳出用户关注的主题分布情况.在真实数据集上的实验表明,RT-LDA模型能很好地对微博进行主题挖掘.

References

[1]  廉捷, 周欣, 曹伟, 刘云. 新浪微博数据挖掘方案[J]. 清华大学学报:自然科学版,2011 51(10): 1300-1305. 
[2]  ZHANG H P, YU H K, XIONG D Y, et al. HHMM-based chinese lexical analyzer ICTCLAS[C]//Proc of the 2nd SigHan Workshop. 2003: 184-187.
[3]  BLEI D M. Probabilistic topic models[C]. Communications of the ACM. 2012, 4:77-84.
[4]  BISHOP C M. Pattern Recognition and Machine Learning[M]. Germany: Springer, 2007.
[5]  PHILIP R, ERIC H. Gibbs sampling for the uninitiated[R]. Technical Reports from UMIACS, 2010, 6.
[6]  STEYVERS M, GRIFFITHS T. Probabilistic topic models[J]. Handbook of Latent Semantic Analysis, 2007, 427(7):424-440.
[7]  WENG J S, LIM E P, JIANG J, et al. TwitterRank: finding topic-sensitive influential Twitterers[C]//Proceedings of the third ACM WSDM, 2010.
[8]  GRIFFITHS T L, STEYVERS M. Finding scientific topics[C]//Proc of the National Academy of Sciences of the United States of America, 2004, 101: 5228-5235.
[9]  IDO D, LEE L, PEREIRA F. Similarity-based methods for word sense disambiguation[C]//Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 1997: 56-63.
[10]  KULLBACK S, LEIBLER R. A. On Information and sufficiency[C]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.
[11]  HONG L, DAVISON B D. Empirical study of topic modeling in Twitter[C]//Proceedings of the SIGKDD Workshop on Social Media Analytics, 2010.
[12]  ZHAO W X, HE J, YAN H F, et al. Comparing Twitter and traditional media using topic models[J]. Advances in Information Retrieval, Proceedings. 2011, 6611:338-349.
[13]  NOORDHUIS P, HEIJKOOP M, LAZOVIK A. Mining Twitter in the cloud: a case study[C]. Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference. 2010 July, 107-114.
[14]  KANG J H, LERMAN K, PLANGPRASOPCHOK A. Analyzing microblogs with affinity propagation [C]//Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010: 67-70.
[15]  BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[16]  张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展,2011, 48(10): 1795-1802.
[17]  RAMAGE D, DUMAIS S, LIEBLING D. Characterizing microblogs with topic models[C]. ICWSM, 2010:130-137.
[18]  DEERWESTER S, DUMAIS S, LANDAUER T. Indexing by latent semantic analysis[J]. Journal of the American Society of Information Science. 1990, 41(6):391-407.
[19]  HOFMANN T. Probabilistic latent semantic indexing[C]//Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval.New York: ACM, 1999:50-57.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133