|
基于BERT模型和LDA主题模型的用户兴趣模型构建方法
|
Abstract:
用户兴趣模型构建主要结合用户的兴趣爱好信息、游览行为以及用户画像信息等综合分析用户兴趣。用户兴趣模型作为个性化信息推荐环节中的关键部分,也是个性化服务的重要部分,其质量的好坏直接影响着个性化信息推荐服务的水平。为了提高用户兴趣建模的质量,本文引入词向量模型和主题模型来准确表示用户兴趣,提出了一种基于BERT模型和LDA主题模型的用户兴趣模型构建方法。该方法将BERT模型和LDA主题模型融合,在训练过程中,模型不仅能够充分利用整个数据集的上下文信息,还能利用LDA获取隐语义信息,同时通过K-means聚类方法提取用户兴趣。实验结果表明,结合后的用户建模方法能够有效解决微博短文本的稀疏性以及上下文依赖性。与其他方法相比,提高了用户兴趣模型的质量。
The construction of the user interest model mainly combines the user’s hobby information, travel behavior and user profile information, and comprehensively analyzes the user’s interest. As a key part of personalized information recommendation, user interest model is also an important part of personalized service. Its quality directly affects the level of personalized information recommendation services. In order to improve the quality of user interest modeling, this paper introduces word vector model and topic model to accurately express user interest, and proposes a user interest model construction method based on BERT model and LDA topic model. This method combines the BERT model and the LDA topic model. In the training process, the model can not only make full use of the context information of the entire data set, but also use LDA to obtain implicit semantic information, and at the same time extract user interests through the K-means clustering method. The experimental results show that the combined user modeling method can effectively solve the problems of sparsity and context dependence of microblog short texts. Compared with other methods, the quality of the user interest model is improved.
[1] | 尚燕敏, 曹亚男, 韩毅, 李阳, 张闯. 基于主题和大众影响的用户动态行为倾向预测[J]. 计算机学报, 2018, 41(7): 1431-1447. |
[2] | 高永兵, 许庆瑞. 基于改进LDA模型的微博用户兴趣挖掘研究[J]. 内蒙古科技大学学报, 2019, 38(3): 272-276. |
[3] | Li, H., Yan, J., Han, W., et al. (2014) Mining User Interest in Microblogs with a User-Topic Model. Communications, 11, 131-144. https://doi.org/10.1109/CC.2014.6911095 |
[4] | Chen, Y., Li, W., Guo, W., et al. (2015) Popular Topic Detection in Chinese Micro-Blog Based on the Modified LDA Model. 2015 12th Web Information System and Application Conference (WISA), Jinan, 11-13 September 2015, 37-42.
https://doi.org/10.1109/WISA.2015.58 |
[5] | 张霁雯, 周军. 一种基于用户兴趣特征的微博信息转发预测方法[J]. 辽宁工业大学学报(自然科学版), 2021, 41(3): 153-156+160. |
[6] | 杨仁凤, 陈端兵, 谢文波. 微博用户兴趣主题抽取方法[J]. 电子科技大学学报, 2018, 47(4): 633-640. |
[7] | Devlin, J., et al. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 4171-4186. |
[8] | Vaswani, A., et al. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 5998-6008. |
[9] | 李俊, 吕学强. 融合BERT语义加权与网络图的关键词抽取方法[J]. 计算机工程, 2020, 46(9): 89-94. |
[10] | Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022. |
[11] | 王丹. 基于主题模型的用户画像提取算法研究[D]: [硕士学位论文]. 北京: 北京工业大学, 2016. |
[12] | Huang, L.F. (2013) Optimized Event Storyline Generation Based on Mixture-Event-Aspect Model. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, ACL, Washington DC, 726-735. |
[13] | 石元兵. 一种基于TextRank的中文自动摘要方法[J]. 通信技术, 2019, 52(9): 2233-2239. |
[14] | Xue, M. (2019) A Text Retrieval Algorithm Based on the Hybrid LDA and Word2Vec Model. 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS) IEEE, Changsha, 12-13 January 2019, 373-376. https://doi.org/10.1109/ICITBS.2019.00098 |
[15] | Wang, S., Liu, S. and Liu, Z. (2011) Research of User Interest Model Based on Ordered Pair Behavior. 2011 IEEE International Conference on Automation and Logistics (ICAL), Chongqing, 15-16 August 2011, 417-421.
https://doi.org/10.1109/ICAL.2011.6024754 |
[16] | Liu, Q., Kai, N., He, Z., et al. (2013) Microblog User Interest Modeling Based on Feature Propagation. 2013 6th International Symposium on Computational Intelligence and Design (ISCID), Vol. 1, 383-386. |