Software programs are always prone to change for several reasons. In a software product line, the change is more often as many software units are carried from one release to another. Also, other new files are added to the reused files. In this work, we explore the possibility of building a model that can predict files with a high chance of experiencing the change from one release to another. Knowing the files that are likely to face a change is vital because it will help to improve the planning, managing resources, and reducing the cost. This also helps to improve the software process, which should lead to better software quality. Also, we explore how different learners perform in this context, and if the learning improves as the software evolved. Predicting change from a release to the next release was successful using logistic regression, J48, and random forest with accuracy and precision scored between 72% to 100%, recall scored between 74% to 100%, and F-score scored between 80% to 100%. We also found that there was no clear evidence regarding if the prediction performance will ever improve as the project evolved.
References
[1]
Alshehri, Y.A., Goseva-Popstojanova, K., Dzielski, D.G. and Devine, T. (2018) Applying Machine Learning to Predict Software Fault Proneness Using Change Metrics, Static Code Metrics, and a Combination of Them. SoutheastCon 2018, St. Petersburg, 19-22 April 2018, 1-7. https://doi.org/10.1109/SECON.2018.8478911
[2]
Bell, R.M., Ostrand, T.J. and Weyuker, E.J. (2011) Does Measuring Code Change Improve Fault Prediction? Proceedings of the 7th International Conference on Predictive Models in Software Engineering, Banff, 20-21 September 2011, Article No. 2.
https://doi.org/10.1145/2020390.2020392
[3]
Goseva-Popstojanova, K., Ahmad, M. and Alshehri, Y. (2019) Software Fault Proneness Prediction with Group Lasso Regression: On Factors That Affect Classification Performance. 2019 IEEE 43rd Annual Computer Software and Applications Conference, Volume 2, 336-343. https://doi.org/10.1109/COMPSAC.2019.10229
[4]
Krishnan, S., Strasburg, C., Lutz, R.R. and GosevaPopstojanova, K. (2011) Are Change Metrics Good Predictors for an Evolving Software Product Line? Proceedings of the 7th International Conference on Predictive Models in Software Engineering, Banff, 20-21 September 2011, Article No. 7.
https://doi.org/10.1145/2020390.2020397
[5]
Moser, R., Pedrycz, W. and Succi, G. (2008) A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction. ACM/IEEE 30th International Conference on Software Engineering, Leipzig, May 2008, 181-190.
https://doi.org/10.1145/1368088.1368114
[6]
Nagappan, N. and Ball, T. (2005) Use of Relative Code Churn Measures to Predict System Defect Density. Proceedings of the 27th International Conference on Software Engineering, St Louis, 15-21 May 2005, 284-292.
https://doi.org/10.1145/1062455.1062514
[7]
Ostrand, T.J., Weyuker, E.J. and Bell, R.M. (2010) Programmer-Based Fault Prediction. Proceedings of the 6th International Conference on Predictive Models in Software Engineering, Timisoara, 12-13 September 2010, Article No. 19.
https://doi.org/10.1145/1868328.1868357
[8]
Weyuker, E.J., Ostrand, T.J. and Bell, R.M. (2008) Do Too Many Cooks Spoil the Broth? Using the Number of Developers to Enhance Defect Prediction Models. Empirical Software Engineering, 13, 539-559.
https://doi.org/10.1007/s10664-008-9082-8
[9]
Khomh, F., Di Penta, M. and Gueheneuc, Y.-G. (2009) An Exploratory Study of the Impact of Code Smells on Software Change-Proneness. 2009 16th Working Conference on Reverse Engineering, Lille, 13-16 October 2009, 75-84.
https://doi.org/10.1109/WCRE.2009.28
[10]
Khomh, F., Di Penta, M., Gueheneuc, Y.-G. and Antoniol, G. (2009) An Exploratory Study of the Impact of Software Changeability.
[11]
Palomba, F., Bavota, G., Di Penta, M., Fasano, F., Oliveto, R. and De Lucia, A. (2018) On the Diffuseness and the Impact on Maintainability of Code Smells: A Large Scale Empirical Investigation. Empirical Software Engineering, 23, 1188-1221.
https://doi.org/10.1007/s10664-017-9535-z
[12]
Abdi, M.K., Lounis, H. and Sahraoui, H. (2006) Analyzing Change Impact in Object Oriented Systems. 32nd EUROMICRO Conference on Software Engineering and Advanced Applications, Cavtat, 29 August-1 September 2006, 310-319.
https://doi.org/10.1109/EUROMICRO.2006.20
[13]
Tsantalis, N., Chatzigeorgiou, A. and Stephanides, G. (2005) Predicting the Probability of Change in Object-Oriented Systems. IEEE Transactions on Software Engineering, 31, 601-614. https://doi.org/10.1109/TSE.2005.83
[14]
Marinescu, C. (2014) How Good Is Genetic Programming at Predicting Changes and Defects? 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, 22-25 September 2014, 544-548.
[15]
Giger, E., Pinzger, M. and Gall, H.C. (2012) Can We Predict Types of Code Changes? An Empirical Analysis. 2012 9th IEEE Working Conference on Mining Software Repositories, Zurich, 2-3 June 2012, 217-226.
https://doi.org/10.1109/MSR.2012.6224284
[16]
Catolino, G., Palomba, F., Fontana, F.A., De Lucia, A., Zaidman, A. and Ferrucci, F. (2020) Improving Change Prediction Models with Code Smell-Related Information. Empirical Software Engineering, 25, 49-95.
https://doi.org/10.1007/s10664-019-09739-0
[17]
Hall, T., Beecham, S., Bowes, D., Gray, D. and Counsell, S. (2012) A systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 38, 1276-1304.
https://doi.org/10.1109/TSE.2011.103
[18]
Lessmann, S., Baesens, B., Mues, C. and Pietsch, S. (2008) Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34, 485-496.
https://doi.org/10.1109/TSE.2008.35
[19]
Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
https://doi.org/10.1023/A:1010933404324
[20]
Liaw, A. and Wiener, M. (2002) Classification and Regression by Random Forest. R News, 2, 18-22. https://doi.org/10.1109/SYNASC.2014.78
[21]
Gune, A., Koru, S. and Liu, H.F. (2007) Identifying and Characterizing Change-Prone Classes in Two Large-Scale Open-Source Products. Journal of Systems and Software, 80, 63-73. https://doi.org/10.1016/j.jss.2006.05.017
[22]
Alshayeb, M. and Li, W. (2003) An Empirical Validation of Object-Oriented Metrics in Two Different Iterative Software Processes. IEEE Transactions on Software Engineering, 29, 1043-1049. https://doi.org/10.1109/TSE.2003.1245305
[23]
Li, W. and Henry, S. (1993) Object-Oriented Metrics That Predict Maintainability. Journal of Systems and Software, 23, 111-122.
https://doi.org/10.1016/0164-1212(93)90077-B
[24]
Romano, D. and Pinzger, M. (2011) Using Source Code Metrics to Predict Changeprone Java Interfaces. 2011 27th IEEE International Conference on Software Maintenance, Williamsburg, 25-30 September 2011, 303-312.
https://doi.org/10.1109/ICSM.2011.6080797
[25]
Zhou, Y.M., Leung, H. and Xu, B.W. (2009) Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness. IEEE Transactions on Software Engineering, 35, 607-623.
https://doi.org/10.1109/TSE.2009.32
[26]
Devine, T., Goseva-Popstojanova, K., Krishnan, S. and Lutz, R.R. (2014) Assessment and Cross-Product Prediction of Software Product Line Quality: Accounting for Reuse across Products, Over Multiple Releases. Automated Software Engineering, 23, 1-50. https://doi.org/10.1007/s10515-014-0160-4
[27]
Krishnan, S., Strasburg, C., Lutz, R.R., GosevaPopstojanova, K. and Dorman, K.S. (2013) Predicting Failure-Proneness in an Evolving Software Product Line. Information and Software Technology, 55, 1479-1495.
https://doi.org/10.1016/j.infsof.2012.11.008