|
- 2017
基于PageRank的新闻关键词提取算法
|
Abstract:
现有的基于复杂网络的关键词提取算法在构建加权文本网络时没有考虑文本的自然语言特性,且在提取关键词时较少涉及复杂网络领域经典算法。本文引入词频分享权重,利用词频特性为节点之间的连边加权。在此基础上,基于PageRank算法,并结合人类语言习惯特性定义位置权重系数,提出了一个新的新闻关键词提取算法——LTWPR算法,综合考虑了文本网络的局部特征和全局特征。采用新浪新闻语料进行了大量实验,结果表明该算法能够快速有效的覆盖新闻作者标注的关键词,且提取效果更佳。
[1] | 左晓飞. 基于复杂网络的关键词提取研究[D]. 西安:西安电子科技大学, 2013. ZUO Xiao-fei. Research on keyword extraction based on complex network[D]. Xian:XiDian University, 2013. |
[2] | CHEN Q, JIANG Z, BIAN J. Chinese keyword extraction using semantically weighted network[C]//International Conference on Intelligent Human-Machine Systems & Cybernetics.[S.l.]:IEEE, 2014:83-86. |
[3] | NAN J, XIAO B, LIN Z, et al. Keywords extraction from Chinese document based on complex network theory[C]//2014 Seventh International Symposium on Computational Intelligence and Design (ISCID).[S.l.]:IEEE, 2015:383-386. |
[4] | 刘通. 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016, 33(2):365-369. LIU Tong. Algorithm research of text key word extraction based on complex networks[J]. Application Research of Computers, 2016, 33(2):365-369. |
[5] | 张华平. ICTCLAS汉语分词系统[EB/OL].[2014-06-25]. http://ictclas.nlpir.org/. ZHANG Hua-ping. ICTCLAS Chinese word segmentation system[EB/OL].[2014-06-25]. http://ictclas.nlpir.org/. |
[6] | SALTON G. Developments in automatic text retrieval[J]. Science, 1991, 253(5023):974-979. |
[7] | 杨凯艳. 基于改进的TFIDF关键词自动提取算法研究[D]. 湖南, 湘潭:湘潭大学, 2015. YANG Kai-yan. Research on automatic keyword extraction algorithm based on improved TFIDF[D]. Xiangtan, Hunan:Xiangtan University, 2015. |
[8] | GUO A, YANG T. Research and improvement of feature words weight based on TFIDF algorithm[C]//Proceedings of the Information Technology, Networking, Electronic and Automation Control Conference(ITNEC 2016). Chongqing, China:IEEE, 2016:415-419. |
[9] | MIHALCEA R, TARAU P. TextRank:Bringing order into texts[C]//Conference on Empirical Methods in Natural Language Processing, EMNLP 2004. Barcelona, Spain:[s.n.], 2004:404-411. |
[10] | BRIN S, PAGE L. The anatomy of a large-scale hyper textual web search engine[C]//Proceedings of the 7th World Wide Web Conference (WWW7). Brisbane, Australia:[s.n.], 1998:107-117. |
[11] | CANCHO R F I, SOLé R V. The small world of human language[J]. Proceedings Biological Sciences, 2001, 268(1482):2261-2266. |
[12] | MATSUO Y, ISHIZUKA M. Keyword extraction from a single document using word co-occurrence statistical information[J]. Transactions of the Japanese Society for Artificial Intelligence, 2011, 13(17):217-223. |
[13] | 任晓龙, 吕琳媛. 网络重要节点排序方法综述[J]. 科学通报, 2014, 59(13):1175-1197. REN Xiao-long, Lü Lin-yuan. Review of ranking nodes in complex networks[J]. Chin Sci Bull, 2014, 59(13):1175-1197. |
[14] | 谢凤宏, 张大为, 黄丹, 等. 基于加权复杂网络的文本关键词提取[J]. 系统科学与数学, 2010, 30(11):1592-1596. XIE Feng-hong, ZHANG Da-wei, HUANG Dan, et al. Keywords extraction based on weighted complex network[J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(11):1592-1596. |
[15] | 唐俊. 复杂网络在新闻网页关键词提取中的应用[J]. 云南民族大学学报(自然科学版), 2012, 21(4):305-308. TANG Jun. Application of complex networks to keyword extraction of news web pages[J]. Journal of Yunnan Nationalities University:Natural Sciences Edition, 2012, 21(4):305-308. |