OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

吉林大学学报(工学版) 2015

一种基于LDA的k话题增量训练算法

, PP. 1242-1252

辛宇, 杨静, 谢志强

Keywords: 人工智能,LDA,变分推理,增量训练,话题分类,自然语言处理

Full-Text Cite this paper Add to My Lib

Abstract:

由于LDA模型需要预先给定话题个数k,因此在进行最优话题个数k选取时需要对语料库进行k值循环计算,从而加剧了算法的复杂度。针对LDA模型的最优k值选取问题,提出LDA话题增量训练算法。该方法首先以词-话题概率熵值作为LDA迭代过程中模糊单词的选取标准,并将抽取模糊单词归入新话题;其次,增加LDA变分推理过程中全局参数β(单词-话题概率矩阵)和α(狄利克雷分布参数)的维数及话题个数k;再次,将变换后的全局参数β、α和k作为输入进行变分训练;最后,循环调用LDA话题增量训练算法并在似然函数值收敛时停止循环过程,完成k的增量训练。此外,通过对真实数据集的实验分析验证了本文算法对最优k值选取的有效性和可行性。

References

[1]	Blei D, Lafferty J. A correlated topic model of science[J]. Annals of Applied Statistics, 2007, 1(1): 17-35.
[2]	Li W, McCallum A. Pachinko allocation: DAG-structured mixture models of topic correlations[C]∥Proceeding of the ICML. Pittsburgh, Pennsylvania, USA, 2006: 577-584.
[3]	Mimno D, Li W, McCallum A. Mixtures of hierarchical topics with pachinko allocation[C]∥Proceeding of the ICML. Corvllis, Oregon, USA, 2007: 633-640.
[4]	Wang X, McCallum A. Topics over time : a non-markov continuous-time model of topical trends[C]∥Proceeding of the Conference on Knowledge Discovery and Data Mining (KDD). Philadelphia, USA, 2006: 113-120.
[5]	Griffiths T L, Steyvers M, Blei D M, et al. Integrating topics and syntax[C]∥Advances in Neural Information Processing Systems 18. Vancouver , Canada, 2004.
[6]	Wallach H. Topic modeling: beyond bag-of-words[C]∥Proceeding of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, 2006:977-984.
[7]	张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展,2011,48(10):1795-1802. Zhang Chen-yi, Sun Jian-ling, Ding Yi-qun. Topic mining for microblog based on MB-LDA model[J]. Journal of Computer Research and Development,2011,48(10): 1795-1802.
[8]	McCallum A, Corrada-Emmanuel A, Wang X. The author recipient-topic model for topic and role discovery in social networks: experiments with enron and academic email[R]. Technical Report UM-CS-2004-096, 2004.
[9]	Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
[10]	徐戈, 王厚峰. 自然语言处理中主题模型的发展[J]. 计算机学报, 2011, 34(8):1423-1436. Xu Ge, Wang Hou-feng. The development of topic models in natural language processing[J]. Chinese Journal of Computers, 2011, 34(8):1423-1436.
[11]	Blei D M, Griffitchs T L,Jordan M I, et al. Hierarchical topic models and the nested Chinese restaurant process[C]∥Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2004:17-24.
[12]	Blei D M, Lafferty J D. Correlated topic models[C]∥Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press, 2006.
[13]	韩晓晖, 马军, 邵海敏, 等. 一种基于LDA的Web论坛低质量回贴检测方法[J]. 计算机研究与发展,2012, 49(9): 1937-1946. Han Xiao-hui, Ma Jun, Shao Hai-min, et al. An LDA based approach to detect the low-quality reply posts in web forums[J]. Journal of Computer Research and Development,2012, 49(9): 1937-1946.
[14]	Blei D M, McAuliffe J. Supervised topic models[C]∥Advances in Neural Information Processing Systems (NIPS). Vancouver, Canada, 2008.
[15]	Steyvers M, Smyth P, Rosen-Zvi M, et al. Probabilistic author-topic models for information discovery[C]∥Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle, Washington, 2004:306-315.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133