OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

中国管理科学 2013

数据分类中的特征选择算法研究

, PP. 38-46

赵宇, 黄思明, 陈锐

Keywords: 数据挖掘,特征选择,分类算法,核矩阵,半正定规划

Full-Text Cite this paper Add to My Lib

Abstract:

？本文应用半正定规划支持向量机模型，将核函数特征子空间的组合作为核映射矩阵，提出一种新的将特征选择整合在数据分类过程中的学习算法。首先，将样本按其特征进行分组，计算每组样本子集的核矩阵；然后将这些核矩阵线性组合后加入基于半正定规划的支持向量机模型中，利用半正定规划支持向量机学习器求解得到各子特征空间的权重系数，其次，根据特征权重系数建立特征贡献度和支持度用于特征选择并控制分类准确率、特征数量和对不同类别样本的分类能力；最后根据最优分类准确率、最少特征数量、最佳泛化能力三项不同目标计算所对应的特征数量和分类结果。实证中采用医学、植物学、文本识别和信用等领域数据以及人工数据集比较该方法和SFS、Relief-F以及SBS算法的特征选择效果。结果表明，在实际数据中，本文提出的方法不但能够保持较好的分类学习效果，而且可以比SFS、Relief-F以及SBS特征选择算法的特征子集数目大幅减少；在人工数据中，该方法可以正确地选出真正的特征，去除噪声特征。

References

[1]	Weston J, Elisseeff A, Schцlkopf B, et al. Use of the zero norm with linear models and kernel methods[J]. Journal of Machine Learning Research, 2003, 3:1439-1461.
[2]	Tibshirani R.Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society, 1996, 267-288.
[3]	Zhang Chunkai, Hu Hong. Feature selection in SVM based on the hybrid of enhanced genetic algorithm and mutual information[M]//Torra V, Narukawa Y, Valls A, et al.Modeling Decisions for Artificial Intelligence. Berlin:Springer, 2006.
[4]	Lian Heng. On feature selection with principal component analysis for one-class SVM[J]. Pattern Recognition Letters, 2012, 33(9): 1027-1031.
[5]	Li Boyang, Wang Qianwei, Hu Jinglu. Feature subset selection: a correlation-based SVM filter approach[J]. IEEJ Transactions on Electrical and Electronic Engineering, 2011, 6(2): 173-179.
[6]	He Qiang, Xie Zongxia, Hu Qinghua, et al. Neighborhood based sample and feature selection for SVM classification learning[J]. Neurocomputing, 2011, 74(10): 1585-1594.
[7]	Chen Feilong, Li F C. Combination of feature selection approaches with SVM in credit scoring[J]. Expert Systems with Applications, 2010, 37(7): 4902-4909.
[8]	Weinberger K Q, Sha F, Saul L K. Learning a kernel matrix for nonlinear dimensionality reduction[C]. Proceedings of the twenty-first international conference on Machine learning', Banff, July 4-8, 2004.
[9]	Mason L, Bartlett P, Baxter J. Improved generalization through explicit optimization of margins[J]. Machine Learning, 2000, 38(3):243-255.
[10]	Kong E B, Dietterich T G. Error-correcting output coding corrects bias and variance[C]. Proceedings of the Twelfth International Conference on Machine Learning, California, July 9-12, 1995.
[11]	Breiman L.Bias, variance and arcing classifiers[R]. Working Paper, University of California, 1996.
[12]	William H. Wolberg and O L. Mangasarian: Multisurface method of pattern separation for medical diagnosis applied to breast cytology[J]. Proceedings of the National Academy of Sciences, 1990, 87(23): 9193-9196.
[13]	余乐安, 汪寿阳. 基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型及其在信用分类中的应用[J]. 系统科学与数学, 2009, 29(10): 1311-1326.
[14]	Hastie T, Tibshirani R, Friedman J. The elements of statistical learning data mining, inference, and predicition[M]. Berlin:Springer-Verlag, 2003.
[15]	张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000, 26 (1): 32-42.
[16]	Wei Liwei, Chen Zhenyu, Li Jianping. Evolution strategies based adaptive L-p LS-SVM[J]. Information Sciences, 2011, 181(14): 3000-3016.
[17]	Lanckriet G, Cristianini N, Bartlett P, et al. Learning the kernel matrix with semidefinite programming[J]. Journal of Machine Learning Research, 2004, 5:27-72.
[18]	赵燕平, 李超.网络安全信息挖掘中的特征选择与专利分析研究[J].中国管理科学, 2004, 12(z1):514-518.
[19]	Guyon I, Elisseeff A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3:1157-1182.
[20]	Ichino M, Sklansky J. Optimum feature selection by zero-one integer programming[J]. IEEE Transaction on Systems, 1984, 14: 737-746.
[21]	Foroutan I, Sklansky J. Feature selection for automatic classification of non-Gaussian data[J].IEEE Transaction on Systems, 1987, 17(2):187-198.
[22]	Kohavi R, John G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1-2):273-324.
[23]	Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 290(5500):2319-2323.
[24]	Balasubramanian M, Schwartz E L. The isomap algorithm and topological stability[J]. Science, 295(5552):7.
[25]	Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 290(5500):2323-2326.
[26]	Rosipal R, Girolami M, Trejo L. Kernel PCA for feature extraction of event related potentials for human signal detection performance[M]//Malmgren B A H, Borga M, Niklasson L.Artificial Neural Networks in Medicine and Biology.Berlin:Springer, 2000.
[27]	Rosipal R, Trejo L. Kernel partial least squares regression in reproducing kernel hilbert space[J]. The Journal of Machine Learning Research, 2002, 2:97-123.
[28]	Saunders C, Gammerman A, Vovk V. Ridge regression learning algorithm in dual variables[C]. Proceedings of the 15th International Conference on Machine Learning, Sydney, July 8-12, 2002.
[29]	Chen Zhenyu, Li Jianping. A multiple kernel support vector machine scheme for simultaneous feature selection and rule-based classification[J]. Artificial Intelligence in Medicine, 2007, 41(2):161-175.
[30]	Choi H, Choi S.Robust kernel isomap[J]. Pattern Recognition, 2007, 40(3):853-862.
[31]	Yan Shuicheng, Xu Dong, Zhang Benyu, et al. Graph embedding and extensions: A general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, Published by the IEEE Computer Society, 2007, 29(1):40-51.
[32]	Graepel T.Kernel matrix completion by semidefinite programming[J].Lecture notes in computer science, Springer, 2002, 2415:694-699.
[33]	Weinberger W, Packer B, Saul L. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization[C]. Proceedings of the tenth international workshop on artificial intelligence and statistics, Barbados, Jan 6-8, 2005.
[34]	Sha Fei, Saul L. Analysis and extension of spectral methods for nonlinear dimensionality reduction[C]. Proceedings of the 22nd international conference on Machine learning, Bonn, August 7-11, 2005.
[35]	Freund R, Mizuno S.Interior point methods: current status and future directions[R].Warking Paper, Operations Research Center, 1996.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133