OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Software Engineering and Applications 2023

集成特征选择的代价敏感Boosting软件缺陷预测方法
Cost Sensitive Boosting Software Defect Prediction Method for Integrated Feature Selection

DOI: 10.12677/SEA.2023.126096, PP. 975-988

唐鹤龙, 李英梅

Keywords: 软件缺陷预测，代价敏感，特征选择，集成学习
Software Defect Prediction, Cost-Sensitive, Feature Selection, Ensemble Learning

Full-Text Cite this paper Add to My Lib

Abstract:

软件中潜在的缺陷会产生严重的后果，通过使用软件缺陷预测技术可以及时地检测到模块中的缺陷。然而，由于软件缺陷数据集中的类不平衡和高维度特征问题，会导致模型的预测性能下降，因此提出了一种集成特征选择的代价敏感Boosting软件缺陷预测方法(Cost-Sensitive Boosting for Feature Selection, CSBFS)。CSBFS首先采用了一种代价敏感的特征选择算法，该算法先计算每个特征对预测结果的贡献值，并根据不同错误类别的代价对贡献值进行调整，选择具有正向贡献的特征作为特征子集，解决了高维度特征的问题；接下来，将这个特征选择算法嵌入进Boosting算法中，在Boosting的每一轮迭代中，为每个基学习器选择合适的特征子集，从而增加了基学习器之间的多样性；此外，通过调整错误类别的权重，为第一类错分样本赋予更高的权重，以缓解类别不平衡问题，进一步提高了预测效果。在20个公开数据集上进行实验，以F-measure、Recall、AUC、G-mean等作为评价指标，实验结果验证了CSBFS方法的有效性。
Potential defects in software can have serious consequences and can be detected in a timely manner by using software defect prediction techniques. However, the problem of class imbalance and high dimensional features in the software defect dataset can lead to a degradation of the model's prediction performance, so a cost-sensitive Boosting for Feature Selection (CSBFS) method for software defect prediction with integrated feature selection is proposed. CSBFS method first employs a cost-sensitive feature selection algorithm. This algorithm first calculates the contribution value of each feature to the prediction result, adjusts the contribution value according to the cost of different error categories, and selects features with positive contribution as a feature subset, which solves the problem of high-dimensional features. Next, this feature selection algorithm is embedded into the Boosting algorithm, and a suitable feature subset is selected for each base learner in each iteration of Boosting, thus increasing the diversity among base learners. In addition, the prediction effect is further improved by adjusting the weights of the wrong categories and assigning higher weights to the first misclassified samples to alleviate the category imbalance problem. Experiments are conducted on 20 public datasets with F-measure, Recall, AUC, G-mean, etc. as evaluation indexes, and the experimental results validate the effectiveness of the CSBFS method.

References

[1]	Li, Z., Jing, X.Y. and Zhu, X. (2018) Progress on Approaches to Software Defect Prediction. IET Software, 12, 161-175. https://doi.org/10.1049/iet-sen.2017.0148
[2]	Chen, L., Wang, C. and Song, S. (2022) Software Defect Prediction Based on Nested-Stacking and Heterogeneous Feature Selection. Complex & Intelligent Systems, 8, 3333-3348. https://doi.org/10.1007/s40747-022-00676-y
[3]	Eivazpour, Z. and Keyvanpour, M.R. (2021) CSSG: A Cost-Sensitive Stacked Generalization Approach for Software Defect Prediction. Software Testing, Verification and Reliability, 31, e1761. https://doi.org/10.1002/stvr.1761
[4]	陈翔, 沈宇翔, 孟少卿, 等. 基于多目标优化的软件缺陷预测特征选择方法[J]. 计算机科学与探索, 2018, 12(9): 1420-1433.
[5]	宫丽娜, 姜淑娟, 姜丽. 软件缺陷预测技术研究进展[J]. 软件学报, 2019, 30(10): 3090-3114. https://doi.org/10.13328/j.cnki.jos.005790
[6]	李莉, 任振康, 石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180. https://doi.org/10.19678/j.issn.1000-3428.0061316
[7]	李勇, 陈思萱, 贾海, 等. 基于C-AdaBoost模型的乳腺癌预测研究[J]. 计算机工程与科学, 2020, 42(8): 1414-1422.
[8]	Guo, S., Dong, J., Li, H., et al. (2021) Software Defect Prediction with Imbalanced Distribution by Radius-Synthetic Minority Over-Sampling Technique. Journal of Software: Evolution and Process, 33, e2362. https://doi.org/10.1002/smr.2362
[9]	Lu, W., Li, Z. and Chu, J. (2017) Adaptive Ensemble Undersampling-Boost: A Novel Learning Framework for Imbalanced Data. Journal of Systems and Software, 132, 272-282. https://doi.org/10.1016/j.jss.2017.07.006
[10]	饶珍丹, 李英梅, 董昊, 等. 多层次过采样集成的不平衡数据缺陷预测模型[J]. 小型微型计算机系统, 2023, 44(4): 888-896. https://doi.org/10.20009/j.cnki.21-1106/TP.2021-0634
[11]	万建武, 杨明. 代价敏感学习方法综述[J]. 软件学报, 2020, 31(1): 113-136. https://doi.org/10.13328/j.cnki.jos.005871
[12]	Viaene, S. and Dedene, G. (2005) Cost-Sensitive Learning and Decision Making Revisited. European Journal of Operational Research, 166, 212-220. https://doi.org/10.1016/j.ejor.2004.03.031
[13]	李郅琴, 杜建强, 聂斌, 等. 特征选择方法综述[J]. 计算机工程与应用, 2019, 55(24): 10-19.
[14]	Xu, X.L., et al. (2021) RFC: A Feature Selection Algorithm for Software Defect Prediction. Journal of Systems Engineering and Electronics, 32, 389-398. https://doi.org/10.23919/JSEE.2021.000032
[15]	张靖. 面向高维小样本数据的分类特征选择算法研究[D]: [博士学位论文]. 合肥: 合肥工业大学, 2014.
[16]	Nahar, N., Ara, F., Neloy, M.A.I., et al. (2019) A Comparative Analysis of the Ensemble Method for Liver Disease Prediction. 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET), Dhaka, 23-24 December 2019, 1-6. https://doi.org/10.1109/ICIET48527.2019.9290507
[17]	Lundberg, S.M. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 4768-4777.
[18]	Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139. https://doi.org/10.1006/jcss.1997.1504
[19]	Shepperd, M., Song, Q., Sun, Z., et al. (2013) Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 39, 1208-1215. https://doi.org/10.1109/TSE.2013.11
[20]	D’Ambros, M., Lanza, M. and Robbes, R. (2010) An Extensive Comparison of Bug Prediction Approaches. 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town, 2-3 May 2010, 31-41. https://doi.org/10.1109/MSR.2010.5463279
[21]	Wu, R., Zhang, H., Kim, S., et al. (2011) Relink: Recovering Links between Bugs and Changes. Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Szeged, 5-9 September 2011, 15-25. https://doi.org/10.1145/2025113.2025120
[22]	刘旭同, 郭肇强, 刘释然, 等. 软件缺陷预测模型间的比较实验: 问题、进展与挑战[J]. 软件学报, 2023, 34(2): 582-624. https://doi.org/10.13328/j.cnki.jos.006714

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

集成特征选择的代价敏感Boosting软件缺陷预测方法Cost Sensitive Boosting Software Defect Prediction Method for Integrated Feature Selection

集成特征选择的代价敏感Boosting软件缺陷预测方法
Cost Sensitive Boosting Software Defect Prediction Method for Integrated Feature Selection