|
- 2018
基于因果岭回归的多数据源科研主题识别方法
|
Abstract:
为了有效解决多数据源科研主题的识别问题,基于因果岭回归建立了一种新的多数据源科研主题识别方法。该方法首先给出了多数据源科研主题识别关键参数(如主题词的引用权重、状态密度)的评价指标,同时根据科研主题形态特征建立了特征函数,并基于因果岭回归给出了具体识别方法。最后,通过仿真实验深入研究了影响该识别方法的关键因素。结果显示,与朴素贝叶斯、KNN算法和MGe-LDA算法相比较,该方法在价值引用量、引用权重和前沿主题相似度等方面具有较大优势。
In order to effectively tackle the research topics identification with multiple data source, a new research topic identification method is presented based on causal regression. In this paper, the evaluation indicators are defined to identify the key parameters of research topics for multiple data source, such as the citation weight and status density of research topics, the feature function is established with morphological characteristics of research topics, and the research topics identification based on multiple data sources is modeled by causal regression. The experimental results show that the proposed method has great advantages in terms of value citation, citation weight and similarity with frontier topics, compared with Naive Bayes, KNN and Mge LDA algorithm