全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

An Approach to Detect Structural Development Defects in Object-Oriented Programs

DOI: 10.4236/ojapps.2024.142036, PP. 494-510

Keywords: Object-Oriented Programming, Structural Development Defect Detection, Software Maintenance, Pre-Trained Models, Features Extraction, Bagging, Neural Network

Full-Text   Cite this paper   Add to My Lib

Abstract:

Structural development defects essentially refer to code structure that violates object-oriented design principles. They make program maintenance challenging and deteriorate software quality over time. Various detection approaches, ranging from traditional heuristic algorithms to machine learning methods, are used to identify these defects. Ensemble learning methods have strengthened the detection of these defects. However, existing approaches do not simultaneously exploit the capabilities of extracting relevant features from pre-trained models and the performance of neural networks for the classification task. Therefore, our goal has been to design a model that combines a pre-trained model to extract relevant features from code excerpts through transfer learning and a bagging method with a base estimator, a dense neural network, for defect classification. To achieve this, we composed multiple samples of the same size with replacements from the imbalanced dataset MLCQ1. For all the samples, we used the CodeT5-small variant to extract features and trained a bagging method with the neural network Roberta Classification Head to classify defects based on these features. We then compared this model to RandomForest, one of the ensemble methods that yields good results. Our experiments showed that the number of base estimators to use for bagging depends on the defect to be detected. Next, we observed that it was not necessary to use a data balancing technique with our model when the imbalance rate was 23%. Finally, for blob detection, RandomForest had a median MCC value of 0.36 compared to 0.12 for our method. However, our method was predominant in Long Method detection with a median MCC value of 0.53 compared to 0.42 for RandomForest. These results suggest that the performance of ensemble methods in detecting structural development defects is dependent on specific defects.

References

[1]  Sharma, T. and Kessentini, M. (2021) Qscored: A Large Dataset of Code Smells and Quality Metrics. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, 17-19 May 2021, 590-594.
https://doi.org/10.1109/MSR52588.2021.00080
[2]  Kovačević, A., et al. (2022) Automatic Detection of Long Method and God Class Code Smells through Neural Source Code Embeddings. Expert Systems with Applications, 204, Article ID: 117607.
https://doi.org/10.1016/j.eswa.2022.117607
[3]  Mhawish, M.Y. and Gupta, M. (2020) Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics. Journal of Computer Science and Technology, 35, 1428-1445.
https://doi.org/10.1007/s11390-020-0323-7
[4]  Velioğlu, S. and Selçuk, Y.E. (2017) An Automated Code Smell and Anti-Pattern Detection Approach. 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), London, 7-9 June 2017, 271-275.
https://doi.org/10.1109/SERA.2017.7965737
[5]  Hadj-Kacem, M. and Bouassida, N. (2018) Towards a Taxonomy of Bad Smells Detection Approaches. Proceedings of the 13th International Conference on Software Technologies, Vol. 1, 164-175.
https://doi.org/10.5220/0006869201980209
[6]  Gupta, A., Suri, B. and Misra, S. (2017) A Systematic Literature Review: Code Bad Smells in Java Source Code. In: Gervasi, O., Murgante, B., Misra, S., et al., éds., Computational Science and Its Applications—ICCSA 2017, Lecture Notes in Computer Science, Vol. 10408, Springer International Publishing, Cham, 665-682.
https://doi.org/10.1007/978-3-319-62404-4_49
[7]  Sharma, T. and Spinellis, D. (2018) A Survey on Software Smells. Journal of Systems and Software, 138, 158-173.
https://doi.org/10.1016/j.jss.2017.12.034
[8]  Liu, H., Jin, J., Xu, Z., Zou, Y., Bu, Y. and Zhang, L. (2019) Deep Learning Based Code Smell Detection. IEEE Transactions on Software Engineering, 47, 1811-1837.
[9]  Hadj-Kacem, M. and Bouassida, N. (2018) A Hybrid Approach to Detect Code Smells Using Deep Learning. Proceedings of the 13th International Conference on Evaluation of Novel Approaches to Software Engineering ENASE, 1, 137-146.
https://doi.org/10.5220/0006709801370146
[10]  Khleel, N.A.A. and Nehéz, K. (2023) Detection of Code Smells Using Machine Learning Techniques Combined with Data-Balancing Methods. International Journal of Advances in Intelligent Informatics, 9, 402-417.
https://doi.org/10.26555/ijain.v9i3.981
[11]  Sharma, T., Efstathiou, V., Louridas, P. and Spinellis, D. (2021) Code Smell Detection by Deep Direct-Learning and Transfer-Learning. Journal of Systems and Software, 176, Article ID: 110936.
https://doi.org/10.1016/j.jss.2021.110936
[12]  Alazba, A. and Aljamaan, H. (2021) Code Smell Detection Using Feature Selection and Stacking Ensemble: An Empirical Investigation. Information and Software Technology, 138, Article ID: 106648.
https://doi.org/10.1016/j.infsof.2021.106648
[13]  Yu, X., Li, F., Zou, K., Keung, J., Feng, S. and Xiao, Y. (2023) On the Relative Value of Imbalanced Learning for Code Smell Detection.
https://doi.org/10.22541/au.167338512.23766841/v1
[14]  Azeem, M.I., Palomba, F., Shi, L. and Wang, Q. (2019) Machine Learning Techniques for Code Smell Detection: A Systematic Literature Review and Meta-Analysis. Information and Software Technology, 108, 115-138.
https://doi.org/10.1016/j.infsof.2018.12.009
[15]  Peldszus, S., Kulcsár, G., Lochau, M. and Schulze, S. (2016) Continuous Detection of Design Flaws in Evolving Object-Oriented Programs Using Incremental Multi-Pattern Matching. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, Singapore, 3-7 September 2016, 578-589.
https://doi.org/10.1145/2970276.2970338
[16]  Chen, Z., Chen, L., Ma, W. and Xu, B. (2016) Detecting Code Smells in Python Programs. 2016 IEEE International Conference on Software Analysis, Testing and Evolution (SATE), Kunming, 3-4 November 2016, 18-23.
https://doi.org/10.1109/SATE.2016.10
[17]  Hammad, M. and Labadi, A. (2016) Automatic Detection of Bad Smells from Code Changes. International Review on Computers and Software, 11, 1016-1027.
https://doi.org/10.15866/irecos.v11i11.10590
[18]  Apprentissage Automatique.
https://www.cnil.fr/fr/definition/apprentissage-automatique
[19]  Hamdy, A. and Tazy, M. (2020) Deep Hybrid Features for Code Smells Detection. Journal of Theoretical and Applied Information Technology, 98, 2684-2696.
[20]  Hadj-Kacem, M. and Bouassida, N. (2019) Deep Representation Learning for Code Smells Detection Using Variational Auto-Encoder. 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, 14-19 July 2019, 1-8.
https://doi.org/10.1109/IJCNN.2019.8851854
[21]  Škipina, M., Slivka, J., Luburić, N. and Kovačević, A. (2022) Automatic Detection of Feature Envy and Data Class Code Smells Using Machine Learning.
https://doi.org/10.36227/techrxiv.21732059.v1
[22]  Dong, Y. (2013) Modélisation probabiliste de classifieurs d’ensemble pour des problèmes à deux classes. THESE pour l’obtention du grade de DOCTEUR, Université de Technologie Troyes, Troyes.
[23]  Khan, A.A., Chaudhari, O. and Chandra, R. (2023) A Review of Ensemble Learning and Data Augmentation Models for Class Imbalanced Problems: Combination, Implementation and Evaluation. Expert Systems with Applications, 244, Article ID: 122778.
https://doi.org/10.1016/j.eswa.2023.122778
[24]  Madeyski, L. and Lewowski, T. (2023) Detecting Code Smells Using Industry-Relevant Data. Information and Software Technology, 155, Article ID: 107112.
https://doi.org/10.1016/j.infsof.2022.107112
[25]  Dewangan, S., Rao, R.S., Mishra, A. and Gupta, M. (2022) Code Smell Detection Using Ensemble Machine Learning Algorithms. Applied Sciences, 12, Article No. 10321.
https://doi.org/10.3390/app122010321
[26]  Mamatha, R., Kumari, P.L.S. and Sharada, A. (2024) Enhanced Software Defect Prediction through Homogeneous Ensemble Models. International Journal of Intelligent Systems and Applications in Engineering, 12, 676-684.
[27]  Zakeri-Nasrabadi, M., Parsa, S., Esmaili, E. and Palomba, F. (2023) A Systematic Literature Review on the Code Smells Datasets and Validation Mechanisms. ACM Computing Surveys, 55, Article No. 298.
https://doi.org/10.1145/3596908
[28]  Fontana, F.A., Ferme, V., Zanoni, M. and Roveda, R. (2015) Towards a Prioritization of Code Debt: A Code Smell Intensity Index. 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD), Bremen, 2 October 2015, 16-24.
https://doi.org/10.1109/MTD.2015.7332620
[29]  Arcelli Fontana, F., Mäntylä, M.V., Zanoni, M. and Marino, A. (2016) Comparing and Experimenting Machine Learning Techniques for Code Smell Detection. Empirical Software Engineering, 21, 1143-1191.
https://doi.org/10.1007/s10664-015-9378-4
[30]  Lewowski, T. and Madeyski, L. (2022) Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review. Developments in Information & Knowledge Management for Business Applications, 3, 285-319.
https://doi.org/10.1007/978-3-030-77916-0_12
[31]  Madeyski, L. and Lewowski, T. (2020) MLCQ: Industry-Relevant Code Smell Data Set.
[32]  Chicco, D. and Jurman, G. (2020) The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics, 21, Article No. 6.
https://doi.org/10.1186/s12864-019-6413-7

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133