全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

面向层次分类的文本特征选择方法

, PP. 103-110

Keywords: 文本特征选择,类别层次相关,层次分类,机器学习

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出一种针对层次分类的文本特征选择方法。先给出类别层次相关度的概念,并利用分类树和训练数据在不同层次上的概率分布进行计算,进而得到分类树中不同类别的重要性。最后基于前面的计算结果,计算每个特征对类别的识别能力,并选择识别能力大的特征组成用于分类的特征集合。实验表明该方法在选取的特征质量以及在accuracy、F1和micro-Precision等分类测度上均优于传统方法。

References

[1]  Sun Jixiang. Modern Pattern Recognition. Changsha, China: National University of Defense Technology Press, 2002 (in Chinese) (孙即祥.现代模式识别.长沙:国防科技大学出版社,2002)
[2]  Liu Huan, Yu Lei. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans on Knowledge and Data Engineering, 2005, 17(4): 491-502
[3]  Yang Yiming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization // Proc of the 14th International Conference on Machine Learning. Nashville, USA, 1997: 412-420
[4]  Yang S M, Wu Xiaobin, Deng Zhihong, et al. Relative Term-Frequency Based Feature Selection for Text Categorization // Proc of the 1st International Conference of Machine Learning and Cybernetics. Beijing, China, 2002: 1432-1436
[5]  Dumais S T, Chen Hao. Hierarchical Classification of Web Content // Proc of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Athers, Greece, 2000: 256-263
[6]  Yu Lei, Ding C, Loscalzo S. Stable Feature Selection via Dense Feature Groups // Proc of the 14th ACM SIG-KDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, USA, 2008: 803-811
[7]  Peng Hanchuan, Long Fuhui, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max Relevance, and Min-Redundancy. IEEE Trans on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238
[8]  Xu Yan, Li Jintao, Wang Bin, et al. A Category Resolve Power-Based Feature Selection Method. Journal of Software, 2008, 19(1): 82-89(in Chinese) (徐 燕,李锦涛,王 斌,等.基于区分类别能力的高性能特征选择方法.软件学报, 2008, 19(1): 82-89)
[9]  Alessio S D, Murray K, Schiaffino R, et al. The Effect of Using Hierarchical Classifiers in Text Categorization // Proc of the 6th International Conference on Content-Based Multimedia Information Access. Paris, France, 2000: 302-313
[10]  Cui Zifeng, Xu Baowen, Zhang Weifeng, et al. A New Approach to Feature Selection for Text Categorization. Wuhan University Journal of Natural Sciences, 2006, 11(5): 1335-1339
[11]  Zhao Shiqi, Zhang Yu, Liu Ting, et al. A Feature Selection Method Based on Class Feature Domains for Text Categorization. Journal of Chinese Information Processing, 2005, 19(6): 21-27 (in Chinese) (赵世奇,张 宇,刘 挺,等.基于类别特征域的文本分类特征选择方法.中文信息学报, 2005, 19(6): 21-27)
[12]  Punera K, Rajan S, Ghosh J. Automatic Construction of N-Ary Tree Based Taxonomies // Proc of the 6th IEEE International Conference on Data Mining. Hongkong, China, 2006: 75-79
[13]  Xing Dikan, Xue Guirong, Yang Qiang, et al. Deep Classifier: Automatically Categorizing Search Results into Large-Scale Hierarchies // Proc of the International Conference on Web Search and Web Data Mining. Palo Alto, USA, 2008: 139-148
[14]  Dhillon I S, Mallela S, Kumar R. Enhanced Word Clustering for Hierarchical Text Classification // Proc of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Canada, 2002: 191-200
[15]  Kullback S. Information Theory and Statistics. New York, USA: Dover Publications, 1968
[16]  Beijing University. Training Set of Chinese Web Page Collection for Classification [DB/OL]. [2009-03-15]. http://www.cwirf.org/SharedRes/DataSet/cct.html (in Chinese) (北京大学.中文网页分类训练集[DB/OL]. [2009-03-15]. http://www.cwirf.org/SharedRes/DataSet/cct.html)
[17]  Lang K. 20 Newgroups Data Set [DB/OL]. [2009-04-10]. http:people.csail.mit.edu/jrennie/20Newsgroups
[18]  Dong Zhendong, Dong Qiang. Hownet [DB/OL]. [2009-03-15]. http://www.keenage.com (in Chinese) (董振东,董 强.知网[DB/OL]. [2009-03-15]. http://www.keenage.com)
[19]  The Natural Language Processing Research Group. WordNet [EB/OL]. [2009-04-10]. http://nlp.shef.ac.uk/result/software.html

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133