OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2017

基于多次随机欠采样和POSS方法的软件缺陷检测
Random undersampling and POSS method for software defect prediction

DOI: 10.6040/j.issn.1672-3961.0.2016.304

方昊,李云
FANG Hao, LI Yun

Keywords: 软件缺陷检测,不平衡性,数据采样,特征选择,
class imbalance,data sampling,feature selection,software defect prediction

Full-Text Cite this paper Add to My Lib

Abstract:

摘要：为了解决因软件缺陷数据存在数据不平衡问题限制了分类器的性能,将POSS(pareto optimization for subset selection)特征选择算法和随机欠采样技术引入到软件缺陷检测中,并利用支持向量机(support vector machine, SVM)构建预测模型。试验结果表明,通过多次随机欠采样可以有效地解决软件缺陷数据不平衡问题,同时使用POSS方法对目标子集进行双向优化,从而提高分类的准确率,其结果要优于Relief、Fisher、MI(mutual information)特征选择算法。
Abstract: In order to solve the problem of imbalance distribution in software defect prediction, POSS(pareto optimization for subset selection)feature selection and random undersampling was applied in this paper, and SVM was used to build the prediction model. The experimental results showed that the problem could be solved effectively by using multiple random undersampling, and the POSS method was treated subset selection as a bi-objective optimization, which could improve the accuracy of classification, the effectiveness of proposed method was verified by comparing with Relief、Fisher、MI(mutual information)

References

[1]	姚旭,王晓丹,张玉玺.特征选择综述[J].控制与决策,2012,27(2):161-166. YAO Xu, WANG Xiaodan, ZHANG Yuxi. Survey of feature selection methods[J]. Control and Decision, 2012, 27(2):161-166.
[2]	LIU H, YU L. Toward integrating feature selection algorithms for classification and clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4):491-502.
[3]	MUNSON J C, KHOSHGOFTAAR T M. Regression modelling of software quality: empirical investigation[J]. Information and Software Technology, 1990, 32(2):106-114.
[4]	徐燕, 李锦涛, 王斌,等. 基于区分类别能力的高性能特征选择方法[J]. 软件学报, 2008, 19(1):82-89. XU Yan, LI Jintao, WANG Bin, et al. A high performance feature selection method based on classification[J].Journal of Software, 2008, 19(1):82-89.
[5]	YU Y, YAO X, ZHOU ZH. On the approximation ability of evolutionary optimization with application to minimum set cover[J].Artificial Intelligence, 2012, 180-181(2):20-33.
[6]	KUBAT M, MATWIN S. Addressing the curse of imbalanced training sets: one-sided selection[C] //Proceedings of the Fourteenth International Conference on Machine Learning.Stanford, USA:ICML, 2000:179-186.
[7]	SONG Q, JIA Z, SHEPPERD M, et al. A general software defect-proneness prediction framework[J].IEEE Transactions on Software Engineering, 2011, 37(3):356-370.
[8]	ZHENG J. Cost-sensitive boosting neural networks for software defect prediction[J]. Expert Systems with Applications, 2010, 37(6):4537-4543.
[9]	WOZNICA A, NGUYEN P, KALOUSIS A. Model mining for robust feature selection[C] //Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China:ACM, 2012: 913-921.
[10]	HUANG Y J, POWERS R, MONTELIONE G T. Protein NMR recall, precision, and F-measure scores(RPF scores): structure quality assessment measures based on information retrieval statistics[J]. Journal of the American Chemical Society, 2005, 127(6): 1665-1674.
[11]	ZHAO Z, GUO S, XU Q, et al. G-means: a clustering algorithm for intrusion detection[C] //Proceedings of the Lecture Notes in Computer Science(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). [S.l.] :Springer, 2009, 5506:563-570.
[12]	WANG S, YAO X. Using class imbalance learning for software defect prediction[J]. IEEE Transactions on Reliability, 2013, 62(2):434-443.
[13]	GAO K, KHOSHGOFTAAR T M, WANG H, et al. Choosing software metrics for defect prediction: an investigation on feature selection techniques[J]. Software: Practice and Experience, 2011, 41(5): 579-606.
[14]	ROBNIK-SIKONJA M, KONONENKO I. Theoretical and empirical analysis of ReliefF and RReliefF[J]. Machine Learning, 2003, 53(1-2):23-69.
[15]	JONG K, MARCHIORI E, SEBAG M, et al. Feature selection in proteomic pattern data with support vector machines[C] //Proceedings of the 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology. La Jolla, USA:IEEE, 2004:41-48.
[16]	RODRIGUEZ D, RUIZ R, CUADRADO-GALLEGO J, et al. Detecting fault modules applying feature selection to classifiers[C] //Proceedings of the 2007 IEEE International Conference on Information Reuse and Integration.Las Vegas, USA:IEEE, 2007: 667-672.
[17]	FORMAN G. An extensive empirical study of feature selection metrics for text classification[J]. Journal of Machine Learning Research, 2003, 3(2):1289-1305.
[18]	QIAN C, YU Y, ZHOU Z H. Subset Selection by Pareto Optimization[C] //Proceedings of the Advances in Neural Information Processing Systems 28(NIPS 2015).Montreal, Canada:NIPS, 2015:1774-1782.
[19]	马衍庆. 基于机器学习的网络流量识别方法与实现[D]. 济南:山东大学, 2014. MA Yanqing. Internet traffic classification and identification based on machine learning[D]. Jinan: Shandong University, 2014.
[20]	METZ C E. Basic principles of ROC analysis[J].Seminars in Nuclear Medicine, 1978, 8(4):283-298.
[21]	CHIDAMBER S R, KEMERER C F. A metrics suite for object oriented design[J]. IEEE Transactions on Software Engineering, 1994, 20(6):476-493.
[22]	GU Q, LI Z, HAN J. Generalized fisher score for feature selection[C] // Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011.Barcelona, Spain:AUAI Press, 2011:266-273.
[23]	GUYON I, WESTON J, BARNHILL S, et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning, 2002, 46(1-3):389-422.
[24]	KHOSHGOFTAAR T M, SELIYA N. Analogy-based practical classification rules for software quality estimation[J].Empirical Software Engineering, 2003, 8(4):325-350.
[25]	KHOSHGOFTAAR T M, GAO K, NAPOLITANO A. An empirical study of feature ranking techniques for software quality prediction[J].International Journal of Software Engineering and Knowledge Engineering, 2012, 22(2):161-183.
[26]	KHOSHGOFTAAR T M, GAO K, NAPOLITANO A, et al. A comparative study of iterative and non-iterative feature selection techniques for software defect prediction[J]. Information Systems Frontiers, 2014, 16(5): 801-822.
[27]	BOEHM B W, PAPCCIO P N. Understanding and controlling software costs[J].IEEE Transactions on Software Engineering, 1998, 14(10):1462-1477.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于多次随机欠采样和POSS方法的软件缺陷检测Random undersampling and POSS method for software defect prediction

基于多次随机欠采样和POSS方法的软件缺陷检测
Random undersampling and POSS method for software defect prediction