Authors :
Isaiah Ifeanyi Nweze; Paul Maduabuchi Agu; Ezekiel Nwibo Gabriel; Charles Ugwute; Chukwuka Abraham Nwovu; Boniface Mbalaba Ofoke
Volume/Issue :
Volume 11 - 2026, Issue 5 - May
Google Scholar :
https://tinyurl.com/4nmc4nhh
Scribd :
https://tinyurl.com/45e8ps5m
DOI :
https://doi.org/10.38124/ijisrt/26May670
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Software defects are a critical challenge in modern software engineering systems due to the growing complexity
of the structure and interdependence of objects in object-oriented system designs, which negatively affect the maintenance
costs and reliability of the system. This paper introduces a machine learning-based system that predicts software defects
using structural software metrics. A structured dataset of 145 software modules of 94 numerical features was analysed using
six major metrics: Coupling Between Objects (CBO), Depth of Inheritance Tree (DIT), Lack of Cohesion of Methods
(LCOM), Number of Children (NOC), Response for Class (RFC) and Weighted Methods per Class (WMC). The statistical
analysis indicated substantial variation and skewness in complexity-related measures, particularly CBO and RFC,
suggesting the presence of structural outliers. Correlation analysis revealed a significant association among coupling,
cohesion, and response measures, indicating that defect proneness is not affected by single factors but rather by interacting
structural properties. The dataset had a moderate class imbalance, with most modules being non-defective. To compare the
predictive performance, three supervised machine learning models (Decision Tree, Random Forest and Logistic Regression)
were trained using a stratified 70:30 train-test split. The Random Forest model achieved the highest overall performance
with an accuracy of 72.73%, precision of 75%, recall of 50%, and an F1-score of 0.60, reflecting a good balance between
classification accuracy and generalisation. Logistic Regression was more precise but had less recall, whereas the Decision
Tree model was less accurate but more interpretable. In general, the results show that statistical analysis and machine
learning provide an efficient approach to early defect detection. The paper highlights the importance of structural
complexity measures in identifying defect-sensitive modules. It advocates using an ensemble learning algorithm to enhance
software quality assurance and overall system reliability.
Keywords :
Software Defect Prediction, Machine Learning, Random Forest, Decision Tree, Logistic Regression, Software Metrics, Classification.
References :
- S. Stradowski and L. Madeyski, "Machine learning in software defect prediction: A business-driven systematic mapping study," Inf. Softw. Technol., vol. 155, p. 107128, 2023.
- N. Grattan, D. A. da Costa, and N. Stanger, "The need for more informative defect prediction: A systematic literature review," Inf. Softw. Technol., vol. 171, p. 107456, 2024.
- A. Alsaeedi and M. Z. Khan, "Software defect prediction using supervised machine learning and ensemble techniques: A comparative study," J. Softw. Eng. Appl., vol. 12, no. 5, pp. 85–100, 2019.
- C. Zhou, P. He, C. Zeng, and J. Ma, "Software defect prediction with semantic and structural information of code based on graph neural networks," Inf. Softw. Technol., vol. 152, p. 107057, 2022.
- A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance analysis of feature selection methods in software defect prediction: A search method approach," Appl—Sci., vol. 9, no. 13, p. 2764, 2019.
- M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, "Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning," PeerJ Comput—Sci., vol. 10, p. e1860, 2024.
- S. K. Pandey and A. K. Tripathi, "An empirical study toward dealing with noise and class imbalance issues in software defect prediction," Soft Comput., vol. 25, no. 21, pp. 13465–13492, 2021.
- G. Giray, K. E. Bennin, Ö. Köksal, Ö. Babur, and B. Tekinerdogan, "On the use of deep learning in software defect prediction," J. Syst. Softw., vol. 195, p. 111537, 2023.
- R. van Dinter, C. Catal, G. Giray, and B. Tekinerdogan, "Just-in-time defect prediction for mobile applications: Using shallow or deep learning?" Softw. Qual. J., vol. 31, no. 4, pp. 1281–1302, 2023.
- Z. M. Zain, S. Sakri, and N. H. A. Ismail, "Application of deep learning in software defect prediction: Systematic literature review and meta-analysis," Inf. Softw. Technol., vol. 158, p. 107175, 2023.
- M. M. Jouybari, A. Tajary, M. Fateh, and V. Abolghasemi, "A novel deep neural network structure for software fault prediction," PeerJ Comput—Sci., vol. 10, p. e2270, 2024.
- C. M. Liapis, A. Karanikola, and S. Kotsiantis, "Data-efficient software defect prediction: A comparative analysis of active learning-enhanced models and voting ensembles," Inf. Sci., vol. 676, p. 120786, 2024.
- H. Alsawalqah, N. Hijazi, M. Eshtay, H. Faris, A. A. Radaideh, I. Aljarah, and Y. Alshamaileh, "Software defect prediction using heterogeneous ensemble classification based on segmented patterns," Appl: Sci., vol. 10, no. 5, p. 1745, 2020.
- C. Liu, D. Yang, X. Xia, M. Yan, and X. Zhang, "A two-phase transfer learning model for cross-project defect prediction," Inf. Softw. Technol., vol. 107, pp. 125–136, 2019.
- A. Alazba and H. Aljamaan, "Software defect prediction using stacking generalisation of optimised tree-based ensembles," Appl. Sci., vol. 12, no. 9, p. 4577, 2022.
- A. N. Babatunde, R. O. Ogundokun, L. B. Adeoye, and S. Misra, "Software defect prediction using DAGging meta-learner-based classifiers," Mathematics, vol. 11, no. 12, p. 2714, 2023.
- A. Daza, "Software defect prediction based on a multiclassifier with hyperparameters: Future work," Results Eng., vol. 25, p. 104123, 2025.
- K. K. Bejjanki, J. Gyani, and N. Gugulothu, "Class imbalance reduction (CIR): A novel approach to software defect prediction in the presence of class imbalance," Symmetry, vol. 12, no. 3, p. 407, 2020.
- S. Feng, J. Keung, X. Yu, Y. Xiao, and M. Zhang, "Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction," Inf. Softw. Technol., vol. 139, p. 106662, 2021.
- S. Goyal, "Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction," Artif. Intell. Rev., vol. 55, no. 3, pp. 2023–2064, 2022.
- S. R. Goyal, "A systematic review on AI-based class imbalance handling in software defect prediction," Results Eng., p. 106578, 2025.
- Z. Xu et al., "LDFR: Learning deep feature representation for software defect prediction," J. Syst. Softw., vol. 158, p. 110402, 2019.
- R. Malhotra and S. Priya, "DHG-BiGRU: Dual-attention based hierarchical gated recurrent unit model for software defect prediction," Inf. Softw. Technol., vol. 178, p. 107600, 2025.
- M. Nashaat and J. Miller, "Refining software defect prediction through attentive neural models for code understanding," J. Syst. Softw., vol. 220, p. 112266, 2025.
- Y. Tang, Y. Zhou, C. Yang, Y. Du, and M. S. Yang, "An instance gravity oversampling method for software defect prediction," Inf. Softw. Technol., vol. 179, p. 107657, 2025.
- B. Arasteh, K. Arasteh, A. Ghaffari, and R. Ghanbarzadeh, "A new binary chaos-based metaheuristic algorithm for software defect prediction," Cluster Comput., vol. 27, no. 7, pp. 10093–10123, 2024.
- F. Yang, G. Zeng, F. Zhong, P. Xiao, W. Zheng, and F. Qiu, "CfExplainer: Explainable just-in-time defect prediction based on counterfactuals," J. Syst. Softw., vol. 218, p. 112182, 2024.
Software defects are a critical challenge in modern software engineering systems due to the growing complexity
of the structure and interdependence of objects in object-oriented system designs, which negatively affect the maintenance
costs and reliability of the system. This paper introduces a machine learning-based system that predicts software defects
using structural software metrics. A structured dataset of 145 software modules of 94 numerical features was analysed using
six major metrics: Coupling Between Objects (CBO), Depth of Inheritance Tree (DIT), Lack of Cohesion of Methods
(LCOM), Number of Children (NOC), Response for Class (RFC) and Weighted Methods per Class (WMC). The statistical
analysis indicated substantial variation and skewness in complexity-related measures, particularly CBO and RFC,
suggesting the presence of structural outliers. Correlation analysis revealed a significant association among coupling,
cohesion, and response measures, indicating that defect proneness is not affected by single factors but rather by interacting
structural properties. The dataset had a moderate class imbalance, with most modules being non-defective. To compare the
predictive performance, three supervised machine learning models (Decision Tree, Random Forest and Logistic Regression)
were trained using a stratified 70:30 train-test split. The Random Forest model achieved the highest overall performance
with an accuracy of 72.73%, precision of 75%, recall of 50%, and an F1-score of 0.60, reflecting a good balance between
classification accuracy and generalisation. Logistic Regression was more precise but had less recall, whereas the Decision
Tree model was less accurate but more interpretable. In general, the results show that statistical analysis and machine
learning provide an efficient approach to early defect detection. The paper highlights the importance of structural
complexity measures in identifying defect-sensitive modules. It advocates using an ensemble learning algorithm to enhance
software quality assurance and overall system reliability.
Keywords :
Software Defect Prediction, Machine Learning, Random Forest, Decision Tree, Logistic Regression, Software Metrics, Classification.