%0 Journal Article
%T 基于改进LDA模型的主题识别及演化研究——以软件开源领域为例
Research on Topic Recognition and Evolution Based on Improved LDA Model—Taking the Field of Software Open Source as an Example
%A 高翔菲
%A 董平军
%J Hans Journal of Data Mining
%P 55-70
%@ 2163-1468
%D 2025
%I Hans Publishing
%R 10.12677/hjdm.2025.151005
%X 目的:针对基于LDA模型进行主题识别及演化分析方法在主题数量选择困难、时间窗口划分主观性强等方面的局限提出优化改进,从而推动主题识别及演化分析方法的进步。方法:结合TF-IDF算法和Word2Vec词向量技术计算主题向量,减少主题生成时常用词汇的影响,同时实现主题向量的语义表达。在主题演化过程中提出基于主题语义距离变化的方法划分时间窗口,跟踪目标领域主题强度和主题内容的演化趋势。最后以软件开源领域研究文献为例进行实证研究。结果:研究结果显示,本文提出的优化方法能够有效识别领域的研究主题及热点主题,跟踪主题随时间演化的路径,并可视化呈现。结论:软件开源研究存在六个关键主题,其中“开源治理”和“市场竞争”是该研究领域的热点主题。从主题内容的演变来看,软件开源的研究正从个人自发参与的自治动机转向企业与政府等组织层面的参与。
Purpose: To address the limitations of topic identification and evolution analysis methods based on LDA models, such as difficulty in selecting the number of topics and strong subjectivity in time window partitioning, and to propose optimization improvements, in order to promote the progress of topic identification and evolution analysis methods. Method: Combining TF-IDF algorithm and Word2Vec word vector technology to calculate topic vectors, reducing the influence of commonly used vocabulary in topic generation, while achieving semantic expression of topic vectors. Propose a method for dividing time windows based on changes in topic semantic distance during the process of topic evolution, and track the evolution trend of topic intensity and content in the target domain. Finally, empirical research will be conducted using literature in the field of open source software as an example. Result: The research results show that the optimization method proposed in this paper can effectively identify research topics and hot topics in the field, track the path of topic evolution over time, and visualize it. Conclusion: There are six key themes in software open source research, among which “open source governance” and “market competition” are hot topics in this research field. From the evolution of the theme content, research on open source software has shifted from the autonomous motivation of individual participation to the participation of organizations such as enterprises and governments.
%K LDA模型,
%K Word2Vec模型,
%K 主题识别及演化,
%K 时间窗口划分,
%K 软件开源
LDA
%K Word2Vec
%K Topic Recognition and Evolution
%K Time Window Division
%K Open Source Software
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=105200