|
- 2017
基于LS-SVM与模糊补准则的特征选择方法
|
Abstract:
摘要: 针对传统特征选择算法采用单一度量的方式难以兼顾泛化性能和降维性能的不足,提出新的特征选择算法(least squares support vector machines and fuzzy supplementary criterion, LS-SVM-FSC)。通过核化的最小二乘支持向量机(least squares support vector machines, LS-SVM)对每个特征的样本进行分类,使用新的模糊隶属度函数获得每个样本对其所属类的模糊隶属度,使用模糊补准则选择具有最小冗余最大相关的特征子集。试验表明:与其他10个特征选择方法与7个隶属度决定方法相比,所提算法在9个数据集上都具有很高的分类准确率和很强的降维性能,且在高维数据集中的学习速度依然很快。
Abstract: Traditional feature selection algorithm used a single scalar metric such that it might become difficult to achieve a trade-off between generalization performance and dimension reduction at the same time. A new feature selection algorithm called LS-SVM-FSC was proposed to circumvent this shortcoming. The kernel-based least squares support vector machines was used to train a set of binary classifiers on each single feature and a kind of new fuzzy membership function was used to obtain fuzzy membership value of each pattern belonging to its class. Based on a new fuzzy supplementary criterion, the features with minimal redundancy and maximal relevance was selected. Experiments indicated that the proposed algorithm had high classification accuracy and strong dimension reduction capability on nine datasets. In particular, it still kept fast learning speed for high-dimensional datasets, in contrast to other ten feature selection methods and seven degree determination methods
[1] | SUYKENS J, VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters,1999,9(3):293-300. |
[2] | TAN M, PU J, ZHENG B. Optimization of breast mass classification using sequential forward floating selection(SFFS)and a support vector machine(SVM)model[J]. International Journal of Computer Assisted Radiology & Surgery, 2014, 9(6):76-82. |
[3] | ZHANG D, CHEN S, ZHOU Z H. Constraint score:a new filter method for feature selection with pairwise constraints[J]. Pattern Recognition, 2008, 41(5):1440-1451. |
[4] | 苟博,黄贤武. 支持向量机多类分类方法[J]. 数据采集与处理, 2006, 21(3):334-339. GOU Bo, HUANG Xianwu. SVM multi-class classification[J]. Journal of Data Acquisition and Processing, 2006, 21(3):334-339. |
[5] | AZADEH A, ARYAEE M, ZARRIN M, et al. A novel performance measurement approach based on trust context using fuzzy T-norm and S-norm operators: the case study of energy consumption[J]. Energy Exploration & Exploitation, 2016, 34(4):561-585. |
[6] | DERELI T, BAYKASOGLU A, ALTUN K, et al. Industrial applications of type-2 fuzzy sets and systems: a concise review[J]. Computers in Industry, 2011, 62(2):125-137. |
[7] | BHATT R B, GOPAL M. On the extension of functional dependency degree from crisp to fuzzy partitions[J]. Pattern Recognition Letters, 2006, 27(5):487-491. |
[8] | PLATT J C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods[J]. Advances in Large Margin Classifiers, 2000, 10(4):61-74. |
[9] | LIU Y, GUO J, HU G, et al. Gene prediction in metagenomic fragments based on the SVM algorithm[J]. Bmc Bioinformatics, 2013, 14(2):1738-1742. |
[10] | BOUCHAFFRA D, GOVINDARAJU V, SRIHARI S. A methodology for mapping scores to probabilities[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1999, 21(9):923-927. |
[11] | STOUT Q F. Isotonic regression via partitioning[J]. Algorithmica, 2013, 66(1):93-112. |
[12] | JAIN A, ZONGKER D. Feature selection: evaluation, application, and small sample performance[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1997, 19(2):153-158. |
[13] | LI D, PEDRYCZ W, PIZZI N J. Fuzzy wavelet packet based feature extraction method and its application to biomedical signal classification[J]. IEEE Transactions on Bio-medical Engineering, 2005, 52(6):1132-1139. |
[14] | MITRA P, MURTHY C A, PAL S K, et al. Unsupervised feature selection using feature similarity[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 24(3):301-312. |
[15] | OOI C H, CHETTY M, TENG S W. Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets[J]. Data Mining & Knowledge Discovery, 2007, 14(3):329-366. |
[16] | ZHANG N, ZHOU Y, HUANG T, et al. Discriminating between lysine sumoylation and lysine acetylation using mRMR feature selection and analysis[J]. Plos One, 2014, 9(9):e107464. |
[17] | NARENDRA P M, FUKUNAGA K. A branch and bound algorithm for feature subset selection[J]. Electronics Letters, 2010, 26(9):917-922. |
[18] | ROBNIK-SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1-2):23-69. |
[19] | MOUSTAKIDIS S P, THEOCHARIS J B. SVM-FuzCoC: a novel SVM-based feature selection method using a fuzzy complementary criterion[J]. Pattern Recognition, 2010, 43(11):3712-3729. |
[20] | CHANG C C, LIN C J. LIBSVM: A library for support vector machines[J]. Acm Transactions on Intelligent Systems & Technology, 2011, 2(3):389-396. |
[21] | 张战成,王士同,邓赵红,等. 支持向量机的一种快速分类算法[J]. 电子与信息学报, 2011, 33(9):2181-2186. ZHANG Zhancheng, WANG Shitong, DENG Zhaohong, et al. Fast decision using SVM for incoming samples[J]. Journal of Electronics and Information Technolog, 2011, 33(9):2181-2186. |
[22] | 李欢,王士同. 适合多观测样本的基于LS-SVM的新分类算法[J]. 计算机工程与应用, 2016, 52(1):113-119. LI Huan, WANG Shitong. Novel LS-SVM based classification algorithm for multi-observation sets[J]. Computer Engineering and Applications, 2016, 52(1):113-119. |
[23] | MADEVSKA-BOGDANOVA A, NIKOLIK D, CURFS L. Probabilistic SVM outputs for pattern recognition using analytical geometry[J]. Neurocomputing, 2004, 62(1):293-303. |