An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, which may lead to different software bugs over the development to occur, causing disappointments in the not-so-distant future. Thus, the prediction of software defects in the first stages has become a primary interest in the field of software engineering. Various software defect prediction (SDP) approaches that rely on software metrics have been proposed in the last two decades. Bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers are known to perform well to predict defects. This paper studies and compares these supervised machine learning and ensemble classifiers on 10 NASA datasets. The experimental results showed that, in the majority of cases, RF was the best performing classifier compared to the others.
Li, J., He, P., Zhu, J. and Lyu, M.R. (2017) Software Defect Prediction via Convolutional Neural Network. 2017 IEEE International Conference on Software Quality, Reliability and Security, 25-29 July 2017, Prague, 318-328.
Punitha, K. and Chitra, S. (2013) Software Defect Prediction Using Software Metrics: A Survey. 2013 International Conference on Information Communication and Embedded Systems, 21-22 February 2013, Chennai, 555-558.
Ge, J., Liu, J. and Liu, W. (2018) Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets. 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 27-29 June 2018, Busan, 399-406. https://doi.org/10.1109/SNPD.2018.8441143
Song, Q., Guo, Y. and Shepperd, M. (2018) A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction. IEEE Transactions on Software Engineering, 1. https://doi.org/10.1109/TSE.2018.2836442
Challagulla, V.U.B., Bastani, F.B., Yen, I.L. and Paul, R.A. (2005) Empirical Assessment of Machine Learning Based Software Defect Prediction Techniques. Proceedings of the 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, 2-4 February 2005, Sedona, 263-270.
Yan, Z., Chen, X. and Guo, P. (2010) Software Defect Prediction Using Fuzzy Support Vector Regression. In: Zhang, L., Lu, B. and Kwok, J., Eds., Advances in Neural Networks, Springer, Berlin, 17-24.
Rathore, S.S. and Kumar, S. (2016) A Decision Tree Regression Based Approach for the Number of Software Faults Prediction. ACM SIGSOFT Softw Are Engineering Notes, 41, 1-6. https://doi.org/10.1145/2853073.2853083
Rathore, S.S. and Kumar, S. (2017) An Empirical Study of Some Software Fault Prediction Techniques for the Number of Faults Prediction. Soft Computing, 21, 7417-7434. https://doi.org/10.1007/s00500-016-2284-x
Vandecruys, O., Martens, D., Baesens, B., Mues, C., Backer, M.D. and Haesen, R. (2008) Mining Software Repositories for Comprehensible Software Fault Prediction Models. Journal of Systems and Software, 81, 823-839.
Elish, K.O. and Elish, M.O. (2008) Predicting Defect-Prone Software Modules Using Support Vector Machines. Journal of Systems and Software, 81, 649-660.
Gray, D., Bowes, D., Davey, N., Sun, Y. and Christianson, B. (2009) Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics. In: Palmer-Brown, D., Draganova, C., Pimenidis, E. and Mouratidis, H., Eds., Engineering Applications of Neural Networks, Springer, Berlin, 223-234. https://doi.org/10.1007/978-3-642-03969-0_21
Perreault, L., Berardinelli, S., Izurieta, C. and Sheppard, J. (2017) Using Classifiers for Software Defect Detection. 26th International Conference on Software Engineering and Data Engineering, 2-4 October 2017, Sydney, 2-4.
Wang, T. and Li, W. (2010) Naive Bayes Software Defect Prediction Model. 2010 International Conference on Computational Intelligence and Software Engineering, 10-12 December 2010, Wuhan, 1-4. https://doi.org/10.1109/CISE.2010.5677057
Jiang, Y., Cukic, B. and Menzies, T. (2007) Fault Prediction Using Early Lifecycle Data. 18th IEEE International Symposium on Software Reliability, 5-9 November 2007, Trollhättan, 237-246. https://doi.org/10.1109/ISSRE.2007.24
Jiang, Y., Cukic, B. and Menzies, T. (2008) Cost Curve Evaluation of Fault Prediction Models. 2008 19th International Symposium on Software Reliability Engineering, 10-14 November 2008, Seattle, 197-206.
Jiang, Y., Lin, J., Cukic, B. and Menzies, T. (2009) Variance Analysis in Software Fault Prediction Models. 2009 20th International Symposium on Software Reliability Engineering, 16-19 November 2009, San Jose, 99-108.
Abdou, A. and Darwish, N. (2018) Early Prediction of Software Defect Using Ensemble Learning: A Comparative Study. International Journal of Computer Applications, 179, 29-40. https://doi.org/10.5120/ijca2018917185
Moustafa, S., El Nainay, M., El Makky, N. and Abougabal, M.S. (2018) Software Bug Prediction Using Weighted Majority Voting Techniques. Alexandria Engineering Journal, 57, 2763-2774. https://doi.org/10.1016/j.aej.2018.01.003
Aleem, S., Capretz, L. and Ahmed, F. (2015) Benchmarking Machine Learning Technologies for Software Defect Detection. International Journal of Software Engineering & Applications, 6, 11-23. https://doi.org/10.5121/ijsea.2015.6302
Jacob, S.G., et al. (2015) Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques. International Journal of Computer Applications, 117, 18-22. https://doi.org/10.5120/20693-3582
Jindal, R., Malhotra, R. and Jain, A. (2014) Software Defect Prediction Using Neural Networks. Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization, 8-10 October 2014, Noida, 1-6.
Sethi, T. (2016) Improved Approach for Software Defect Prediction Using Artificial Neural Networks. 2016 5th International Conference on Reliability, Infocom Technologies and Optimization, 7-9 September 2016, Noida, 480-485.
Bishnu, P.S. and Bhattacherjee, V. (2012) Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, 24, 1146-1150. https://doi.org/10.1109/TKDE.2011.163
Park, M. and Hong, E. (2014) Software Fault Prediction Model Using Clustering Algorithms Determining the Number of Clusters Automatically. International Journal of Software Engineering and Its Applications, 8, 199-204.
Ma, Y., Luo, G., Zeng, X. and Chen, A. (2012) Transfer Learning for Cross-Company Software Defect Prediction. Information and Software Technology, 54, 248-256.
Cao, Q., Sun, Q., Cao, Q. and Tan, H. (2015) Software Defect Prediction via Transfer Learning Based Neural Network. 2015 1st International Conference on Reliability Systems Engineering, 21-23 October 2015, Beijing, 1-10.
Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J. and Riquelme, J.C. (2014) Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, 13-14 May 2014, London, 1-10.
Pelayo, L. and Dick, S. (2007) Applying Novel Resampling Strategies to Software Defect Prediction. NAFIPS 2007 Annual Meeting of the North American Fuzzy Information Processing Society, 24-27 June 2007, San Diego, 69-72.
Pak, C., Wang, T. and Su, X.H. (2018) An Empirical Study on Software Defect Prediction Using Over-Sampling by Smote. International Journal of Software Engineering and Knowledge Engineering, 28, 811-830.
Shatnawi, R. (2012) Improving Software Fault-Prediction for Imbalanced Data. 2012 International Conference on Innovations in Information Technology, 18-20 March 2012, London, 54-59. https://doi.org/10.1109/INNOVATIONS.2012.6207774
Zhang, H. (2009) An Investigation of the Relationships between Lines of Code and Defects. 2009 IEEE International Conference on Software Maintenance, 20-26 September 2009, Edmonton, 274-283. https://doi.org/10.1109/ICSM.2009.5306304
Mende, T. and Koschke, R. (2009) Revisiting the Evaluation of Defect Prediction Models. Proceedings of the 5th International Conference on Predictor Models in Software Engineering, 18-19 May 2009, Canada, 1-10.
Lessmann, S., Baesens, B., Mues, C. and Pietsch, S. (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34, 485-496.
Song, Q., Jia, Z., Shepperd, M., Ying, S. and Liu, J. (2011) A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37, 356-370. https://doi.org/10.1109/TSE.2010.90
Menzies, T., Greenwald, J. and Frank, A. (2007) Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33, 2-13.
Radjenovi, D., Heriko, M., Torkar, R. and Radjenovi, A. (2013) Software Fault Prediction Metrics: A Systematic Literature Review. Information and Software Technology, 55, 1397-1418. https://doi.org/10.1016/j.infsof.2013.02.009
Singh, A., Bhatia, R. and Singhrova, A. (2018) Taxonomy of Machine Learning Algorithms in Software Fault Prediction Using Object Oriented Metrics. Procedia Computer Science, 132, 993-1001.
Kim, S., Zhang, H., Wu, R. and Gong, L. (2011) Dealing with Noise in Defect Prediction. 2011 33rd International Conference on Software Engineering, 21-28 May 2011, Waikiki, 481-490. https://doi.org/10.1145/1985793.1985859
Lee, T., Nam, J., Han, D., Kim, S. and In, H. (2011) Micro Interaction Metrics for Defect Prediction. Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 5-9 September 2011, Szeged, 311-321. https://doi.org/10.1145/2025113.2025156
Chug, A. and Dhall, S. (2013) Software Defect Prediction Using Supervised Learning Algorithm and Unsupervised Learning Algorithm. Confluence 2013: The Next Generation Information Technology Summit, 26-27 September 2013, Uttar Pradesh, 173-179. https://doi.org/10.1049/cp.2013.2313
Deep Singh, P. and Chug, A. (2017) Software Defect Prediction Analysis Using Machine Learning Algorithms. 2017 7th International Conference on Cloud Computing, Data Science Engineering-Confluence, 12-13 January 2017, Noida, 775-781. https://doi.org/10.1109/CONFLUENCE.2017.7943255
Hussain, S., Keung, J., Khan, A. and Bennin, K. (2015) Performance Evaluation of Ensemble Methods for Software Fault Prediction: An Experiment. Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference, 2, 91-95.
Hammouri, A., Hammad, M., Alnabhan, M. and Alsarayra, F. (2018) Software Bug Prediction Using Machine Learning Approach. International Journal of Advanced Computer Science and Applications, 9, 78-83.
Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L. and Alhindawi, N. (2017) Hybrid Smote-Ensemble Approach for Software Defect Prediction. In: Silhavy, R., Silhavy, P., Prokopova, Z., Senkerik, R. and Oplatkova, Z., Eds., Software Engineering Trends and Techniques in Intelligent Systems, Springer, Berlin, 355-366.