全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2016 

结合全局和局部信息的特征选择算法
Feature selection combined with the global and local information(GLFS)

DOI: 10.6040/j.issn.1671-9352.1.2015.E17

Keywords: 全局和局部信息,特征选择,ALOFT,特征权重,文本分类,
the global and local information
,feature selection,text classification,ALOFT,feature weight

Full-Text   Cite this paper   Add to My Lib

Abstract:

摘要: 特征选择方法的优劣直接影响到文本分类的效果。传统的特征选择算法是以全局的方式来选取特征,这种方式忽视了局部特征对分类效果的影响,有时候甚至会导致很多训练文档没有特征。因此,在传统的特征选择方法主要考虑文档集全局特征的基础上,增加词对单篇文档的贡献率的考虑,并结合ALOFT方法,提出了一个结合全局和局部信息的特征选择算法(GLFS)。在路透社文档集及复旦文档集上的实验结果表明,本文提出的算法在保证每个文档都有特征词的同时提高了分类效果。最后讨论了对特征权重的确定方法,经过重新计算特征权重后分类效果有了较大的提高。
Abstract: Feature selection methods directly affect the effect of text categorization. Traditional feature selection algorithm is based on global approach, ignoring the influence of local features, and even makes a lot of training document has no features. Therefore, the paper proposed a feature selection algorithm combined with the ALOFT method, which unify the traditional globe features and contribution rate of a word to individual document to unify the global and local information(GLFS). Experimental results in the Reuters data set and Fudan data set show that the method can ensure that each document has a characteristic word and improve classification performance. Furthermore, the paper discussed the influence of the new method of feature weights to classification

References

[1]  辛竹,周亚建.文本分类中互信息特征选择方法的研究与算法改进[J].计算机应用,2013,33(S2):116-118, 152. XIN Zhu, ZHOU Yajian. Study and improvement of mutual information for feature selection in text categorization[J]. Journal of Computer Applications, 2013, 33(S2):116-118, 152.
[2]  郑俊飞.文本分类特征选择与分类算法的改进[D].西安:西安电子科技大学,2012. ZHENG Junfei. Improvement on feature selection and classification algorithm for text classification[D]. Xian: Xidian University, 2012.
[3]  Fabfizio Sebastiani. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002, 34(1):1-47.
[4]  谭松波. 高性能文本分类算法研究[D].北京:中国科学院计算机研究所,2006. TAN Songbo. Research on high-performance text categorization[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2006.
[5]  SANTANALEA L E A, DE OLIVEIRA D F, CANUTO A M P, et al. A comparative analysis of feature selection methods for ensembles with different combination methods[C] // Proceedings of Internation Joint Conference on Neural Networks. Piscataway: IEEE Press, 2007: 643-648.
[6]  PINHEIRO R H W, CAVALCANTI G D C, CORREA R F, et al. A global-ranking local feature selection method for text categorization[J]. Original Research Article Expert Systems with Applications, 2012, 39(17):12851-12857.
[7]  成卫青,唐旋.一种基于改进互信息和信息熵的文本特征选择方法[J].南京邮电大学学报(自然科学版),2013, 33(5):63-68. CHENG Weiqing, TANG Xuan. A text feature selection method using the improved mutual information and information entropy[J]. Journal of Nanjing University of Posts and Telecommunications(Natural Science), 2013, 33(5):63-68.
[8]  胡改蝶.中文文本分类中特征选择方法的应用与研究[D].太原:太原理工大学,2011. HU Gaidie. Application and research of feature selection method in chinese text categorization[D]. Taiyuan: Taiyuan University of Technology, 2011.
[9]  尚文倩.文本分类及其相关技术研究[D].北京:北京交通大学,2007. SHANG Wenqian. Research on text categorization and technologies[D]. Beijing: Beijing Jiaotong University, 2007.
[10]  张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J].计算机应用研究,2012,29(7):2541-2543. ZHANG Yufang, WAN Binhou, XIONG Zhongyang. Research on feature dimension reduction in text classification[J]. Application Research of Computers, 2012, 29(7):2541-2543.
[11]  郭颂,马飞.文本分类中信息增益特征选择算法的改进[J].计算机应用与软件, 2013(08):139-142. GUO Song, MA Fei. Improving the algorithm of information gain feature selection in text classification[J]. Computer Applications and Software, 2013(08):139-142.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133