All Title Author
Keywords Abstract

Software Defect Prediction Using Supervised Machine Learning and Ensemble Techniques: A Comparative Study

DOI: 10.4236/jsea.2019.125007, PP. 85-100

Keywords: Machine Learning, Ensembles, Prediction, Software Metrics, Software Defect

Full-Text   Cite this paper   Add to My Lib


An essential objective of software development is to locate and fix defects ahead of schedule that could be expected under diverse circumstances. Many software development activities are performed by individuals, which may lead to different software bugs over the development to occur, causing disappointments in the not-so-distant future. Thus, the prediction of software defects in the first stages has become a primary interest in the field of software engineering. Various software defect prediction (SDP) approaches that rely on software metrics have been proposed in the last two decades. Bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers are known to perform well to predict defects. This paper studies and compares these supervised machine learning and ensemble classifiers on 10 NASA datasets. The experimental results showed that, in the majority of cases, RF was the best performing classifier compared to the others.


[1]  Rawat, M.S. and Dubey, S.K. (2012) Software Defect Prediction Models for Quality Improvement: A Literature Study. International Journal of Computer Science Issues, 9, 288-296.
[2]  Li, J., He, P., Zhu, J. and Lyu, M.R. (2017) Software Defect Prediction via Convolutional Neural Network. 2017 IEEE International Conference on Software Quality, Reliability and Security, 25-29 July 2017, Prague, 318-328.
[3]  Hassan, F., Farhan, S., Fahiem, M.A. and Tauseef, H. (2018) A Review on Machine Learning Techniques for Software Defect Prediction. Technical Journal, 23, 63-71.
[4]  Punitha, K. and Chitra, S. (2013) Software Defect Prediction Using Software Metrics: A Survey. 2013 International Conference on Information Communication and Embedded Systems, 21-22 February 2013, Chennai, 555-558.
[5]  Kalaivani, N. and Beena, R. (2018) Overview of Software Defect Prediction Using Machine Learning Algorithms. International Journal of Pure and Applied Mathematics, 118, 3863-3873.
[6]  Ge, J., Liu, J. and Liu, W. (2018) Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets. 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 27-29 June 2018, Busan, 399-406.
[7]  Song, Q., Guo, Y. and Shepperd, M. (2018) A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction. IEEE Transactions on Software Engineering, 1.
[8]  Chang, R.H., Mu, X.D. and Zhang, L. (2011) Software Defect Prediction Using Non-Negative Matrix Factorization. Journal of Software, 6, 2114-2120.
[9]  Challagulla, V.U.B., Bastani, F.B., Yen, I.L. and Paul, R.A. (2005) Empirical Assessment of Machine Learning Based Software Defect Prediction Techniques. Proceedings of the 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, 2-4 February 2005, Sedona, 263-270.
[10]  Yan, Z., Chen, X. and Guo, P. (2010) Software Defect Prediction Using Fuzzy Support Vector Regression. In: Zhang, L., Lu, B. and Kwok, J., Eds., Advances in Neural Networks, Springer, Berlin, 17-24.
[11]  Rathore, S.S. and Kumar, S. (2016) A Decision Tree Regression Based Approach for the Number of Software Faults Prediction. ACM SIGSOFT Softw Are Engineering Notes, 41, 1-6.
[12]  Rathore, S.S. and Kumar, S. (2017) An Empirical Study of Some Software Fault Prediction Techniques for the Number of Faults Prediction. Soft Computing, 21, 7417-7434.
[13]  Wang, H. (2014) Software Defects Classification Prediction Based on Mining Software Repository. Master’s Thesis, Uppsala University, Department of Information Technology.
[14]  Vandecruys, O., Martens, D., Baesens, B., Mues, C., Backer, M.D. and Haesen, R. (2008) Mining Software Repositories for Comprehensible Software Fault Prediction Models. Journal of Systems and Software, 81, 823-839.
[15]  Vapnik, V. (2013) The Nature of Statistical Learning Theory. Springer, Berlin.
[16]  Elish, K.O. and Elish, M.O. (2008) Predicting Defect-Prone Software Modules Using Support Vector Machines. Journal of Systems and Software, 81, 649-660.
[17]  Gray, D., Bowes, D., Davey, N., Sun, Y. and Christianson, B. (2009) Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics. In: Palmer-Brown, D., Draganova, C., Pimenidis, E. and Mouratidis, H., Eds., Engineering Applications of Neural Networks, Springer, Berlin, 223-234.
[18]  Wang, H., Khoshgoftaar, T.M. and Seliya, N. (2011) How Many Software Metrics Should Be Selected for Defect Prediction? 24th International FLAIRS Conference, 18-20 May 2011, Palm Beach, 69-74.
[19]  Perreault, L., Berardinelli, S., Izurieta, C. and Sheppard, J. (2017) Using Classifiers for Software Defect Detection. 26th International Conference on Software Engineering and Data Engineering, 2-4 October 2017, Sydney, 2-4.
[20]  Wang, T. and Li, W. (2010) Naive Bayes Software Defect Prediction Model. 2010 International Conference on Computational Intelligence and Software Engineering, 10-12 December 2010, Wuhan, 1-4.
[21]  Jiang, Y., Cukic, B. and Menzies, T. (2007) Fault Prediction Using Early Lifecycle Data. 18th IEEE International Symposium on Software Reliability, 5-9 November 2007, Trollhättan, 237-246.
[22]  Wang, Tao, Li, W., Shi, H. and Liu, Z. (2011) Software Defect Prediction Based on Classifiers Ensemble. Journal of Information & Computational Science, 8, 4241-4254.
[23]  Jiang, Y., Cukic, B. and Menzies, T. (2008) Cost Curve Evaluation of Fault Prediction Models. 2008 19th International Symposium on Software Reliability Engineering, 10-14 November 2008, Seattle, 197-206.
[24]  Jiang, Y., Lin, J., Cukic, B. and Menzies, T. (2009) Variance Analysis in Software Fault Prediction Models. 2009 20th International Symposium on Software Reliability Engineering, 16-19 November 2009, San Jose, 99-108.
[25]  Abdou, A. and Darwish, N. (2018) Early Prediction of Software Defect Using Ensemble Learning: A Comparative Study. International Journal of Computer Applications, 179, 29-40.
[26]  Moustafa, S., El Nainay, M., El Makky, N. and Abougabal, M.S. (2018) Software Bug Prediction Using Weighted Majority Voting Techniques. Alexandria Engineering Journal, 57, 2763-2774.
[27]  Aleem, S., Capretz, L. and Ahmed, F. (2015) Benchmarking Machine Learning Technologies for Software Defect Detection. International Journal of Software Engineering & Applications, 6, 11-23.
[28]  Jacob, S.G., et al. (2015) Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques. International Journal of Computer Applications, 117, 18-22.
[29]  Kumar, R. and Gupta, D.L. (2016) Software Bug Prediction System Using Neural Network. European Journal of Advances in Engineering and Technology, 3, 78-84.
[30]  Jindal, R., Malhotra, R. and Jain, A. (2014) Software Defect Prediction Using Neural Networks. Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization, 8-10 October 2014, Noida, 1-6.
[31]  Sethi, T. (2016) Improved Approach for Software Defect Prediction Using Artificial Neural Networks. 2016 5th International Conference on Reliability, Infocom Technologies and Optimization, 7-9 September 2016, Noida, 480-485.
[32]  Jayanthi, R. and Florence, L. (2018) Software Defect Prediction Techniques Using Metrics Based on Neural Network Classifier. Cluster Computing, 1-12.
[33]  Bishnu, P.S. and Bhattacherjee, V. (2012) Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, 24, 1146-1150.
[34]  Park, M. and Hong, E. (2014) Software Fault Prediction Model Using Clustering Algorithms Determining the Number of Clusters Automatically. International Journal of Software Engineering and Its Applications, 8, 199-204.
[35]  Catal, C., Sevim, U. and Diri, B. (2009) Software Fault Prediction of Unlabeled Program Modules. Proceedings of the World Congress on Engineering, 1, 1-3.
[36]  Han, J., Pei, J. and Kamber, M. (2011) Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems.
[37]  Ma, Y., Luo, G., Zeng, X. and Chen, A. (2012) Transfer Learning for Cross-Company Software Defect Prediction. Information and Software Technology, 54, 248-256.
[38]  Cao, Q., Sun, Q., Cao, Q. and Tan, H. (2015) Software Defect Prediction via Transfer Learning Based Neural Network. 2015 1st International Conference on Reliability Systems Engineering, 21-23 October 2015, Beijing, 1-10.
[39]  Rodriguez, D., Herraiz, I., Harrison, R., Dolado, J. and Riquelme, J.C. (2014) Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction. Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, 13-14 May 2014, London, 1-10.
[40]  Pelayo, L. and Dick, S. (2007) Applying Novel Resampling Strategies to Software Defect Prediction. NAFIPS 2007 Annual Meeting of the North American Fuzzy Information Processing Society, 24-27 June 2007, San Diego, 69-72.
[41]  Pak, C., Wang, T. and Su, X.H. (2018) An Empirical Study on Software Defect Prediction Using Over-Sampling by Smote. International Journal of Software Engineering and Knowledge Engineering, 28, 811-830.
[42]  Shatnawi, R. (2012) Improving Software Fault-Prediction for Imbalanced Data. 2012 International Conference on Innovations in Information Technology, 18-20 March 2012, London, 54-59.
[43]  Zhang, H. (2009) An Investigation of the Relationships between Lines of Code and Defects. 2009 IEEE International Conference on Software Maintenance, 20-26 September 2009, Edmonton, 274-283.
[44]  Mende, T. and Koschke, R. (2009) Revisiting the Evaluation of Defect Prediction Models. Proceedings of the 5th International Conference on Predictor Models in Software Engineering, 18-19 May 2009, Canada, 1-10.
[45]  McCabe, T.J. (1976) A Complexity Measure. IEEE Transactions on Software Engineering, 2, 308-320.
[46]  Ohlsson, N. and Alberg, H. (1996) Predicting Fault-Prone Software Modules in Telephone Switches. IEEE Transactions on Software Engineering, 22, 886-894.
[47]  Lessmann, S., Baesens, B., Mues, C. and Pietsch, S. (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34, 485-496.
[48]  Song, Q., Jia, Z., Shepperd, M., Ying, S. and Liu, J. (2011) A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37, 356-370.
[49]  Halstead, M.H. (1977) Elements of Software Science (Operating and Programming Systems Series).
[50]  Menzies, T., Greenwald, J. and Frank, A. (2007) Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33, 2-13.
[51]  Radjenovi, D., Heriko, M., Torkar, R. and Radjenovi, A. (2013) Software Fault Prediction Metrics: A Systematic Literature Review. Information and Software Technology, 55, 1397-1418.
[52]  Chidamber, S.R. and Kemerer, C.F. (1994) A Metrics Suite for Object Oriented Design. IEEE Transactions on Software Engineering, 20, 476-493.
[53]  Jureczko, M. and Spinellis, D.D. (2010) Using Object-Oriented Design Metrics to Predict Software Defects.
[54]  Gupta, D.L. and Saxena, K. (2017) Software Bug Prediction Using Object-Oriented Metrics. Sadhana, 42, 655-669.
[55]  Singh, A., Bhatia, R. and Singhrova, A. (2018) Taxonomy of Machine Learning Algorithms in Software Fault Prediction Using Object Oriented Metrics. Procedia Computer Science, 132, 993-1001.
[56]  Kim, S., Zhang, H., Wu, R. and Gong, L. (2011) Dealing with Noise in Defect Prediction. 2011 33rd International Conference on Software Engineering, 21-28 May 2011, Waikiki, 481-490.
[57]  Lee, T., Nam, J., Han, D., Kim, S. and In, H. (2011) Micro Interaction Metrics for Defect Prediction. Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 5-9 September 2011, Szeged, 311-321.
[58]  Chug, A. and Dhall, S. (2013) Software Defect Prediction Using Supervised Learning Algorithm and Unsupervised Learning Algorithm. Confluence 2013: The Next Generation Information Technology Summit, 26-27 September 2013, Uttar Pradesh, 173-179.
[59]  Deep Singh, P. and Chug, A. (2017) Software Defect Prediction Analysis Using Machine Learning Algorithms. 2017 7th International Conference on Cloud Computing, Data Science Engineering-Confluence, 12-13 January 2017, Noida, 775-781.
[60]  Hussain, S., Keung, J., Khan, A. and Bennin, K. (2015) Performance Evaluation of Ensemble Methods for Software Fault Prediction: An Experiment. Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference, 2, 91-95.
[61]  Hammouri, A., Hammad, M., Alnabhan, M. and Alsarayra, F. (2018) Software Bug Prediction Using Machine Learning Approach. International Journal of Advanced Computer Science and Applications, 9, 78-83.
[62]  Tantithamthavorn. An R package of Defect Prediction Datasets for Software Engineering Research.
[63]  Alsawalqah, H., Faris, H., Aljarah, I., Alnemer, L. and Alhindawi, N. (2017) Hybrid Smote-Ensemble Approach for Software Defect Prediction. In: Silhavy, R., Silhavy, P., Prokopova, Z., Senkerik, R. and Oplatkova, Z., Eds., Software Engineering Trends and Techniques in Intelligent Systems, Springer, Berlin, 355-366.


comments powered by Disqus