⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



A Machine Learning-Based Framework for Software Defect Prediction Using Structural Software Metrics: An Empirical Analysis


Authors : Isaiah Ifeanyi Nweze; Paul Maduabuchi Agu; Ezekiel Nwibo Gabriel; Charles Ugwute; Chukwuka Abraham Nwovu; Boniface Mbalaba Ofoke

Volume/Issue : Volume 11 - 2026, Issue 5 - May


Google Scholar : https://tinyurl.com/4nmc4nhh

Scribd : https://tinyurl.com/45e8ps5m

DOI : https://doi.org/10.38124/ijisrt/26May670

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Software defects are a critical challenge in modern software engineering systems due to the growing complexity of the structure and interdependence of objects in object-oriented system designs, which negatively affect the maintenance costs and reliability of the system. This paper introduces a machine learning-based system that predicts software defects using structural software metrics. A structured dataset of 145 software modules of 94 numerical features was analysed using six major metrics: Coupling Between Objects (CBO), Depth of Inheritance Tree (DIT), Lack of Cohesion of Methods (LCOM), Number of Children (NOC), Response for Class (RFC) and Weighted Methods per Class (WMC). The statistical analysis indicated substantial variation and skewness in complexity-related measures, particularly CBO and RFC, suggesting the presence of structural outliers. Correlation analysis revealed a significant association among coupling, cohesion, and response measures, indicating that defect proneness is not affected by single factors but rather by interacting structural properties. The dataset had a moderate class imbalance, with most modules being non-defective. To compare the predictive performance, three supervised machine learning models (Decision Tree, Random Forest and Logistic Regression) were trained using a stratified 70:30 train-test split. The Random Forest model achieved the highest overall performance with an accuracy of 72.73%, precision of 75%, recall of 50%, and an F1-score of 0.60, reflecting a good balance between classification accuracy and generalisation. Logistic Regression was more precise but had less recall, whereas the Decision Tree model was less accurate but more interpretable. In general, the results show that statistical analysis and machine learning provide an efficient approach to early defect detection. The paper highlights the importance of structural complexity measures in identifying defect-sensitive modules. It advocates using an ensemble learning algorithm to enhance software quality assurance and overall system reliability.

Keywords : Software Defect Prediction, Machine Learning, Random Forest, Decision Tree, Logistic Regression, Software Metrics, Classification.

References :

  1. S. Stradowski and L. Madeyski, "Machine learning in software defect prediction: A business-driven systematic mapping study," Inf. Softw. Technol., vol. 155, p. 107128, 2023.
  2. N. Grattan, D. A. da Costa, and N. Stanger, "The need for more informative defect prediction: A systematic literature review," Inf. Softw. Technol., vol. 171, p. 107456, 2024.
  3. A. Alsaeedi and M. Z. Khan, "Software defect prediction using supervised machine learning and ensemble techniques: A comparative study," J. Softw. Eng. Appl., vol. 12, no. 5, pp. 85–100, 2019.
  4. C. Zhou, P. He, C. Zeng, and J. Ma, "Software defect prediction with semantic and structural information of code based on graph neural networks," Inf. Softw. Technol., vol. 152, p. 107057, 2022.
  5. A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, "Performance analysis of feature selection methods in software defect prediction: A search method approach," Appl—Sci., vol. 9, no. 13, p. 2764, 2019.
  6. M. Ali, T. Mazhar, A. Al-Rasheed, T. Shahzad, Y. Y. Ghadi, and M. A. Khan, "Enhancing software defect prediction: A framework with improved feature selection and ensemble machine learning," PeerJ Comput—Sci., vol. 10, p. e1860, 2024.
  7. S. K. Pandey and A. K. Tripathi, "An empirical study toward dealing with noise and class imbalance issues in software defect prediction," Soft Comput., vol. 25, no. 21, pp. 13465–13492, 2021.
  8. G. Giray, K. E. Bennin, Ö. Köksal, Ö. Babur, and B. Tekinerdogan, "On the use of deep learning in software defect prediction," J. Syst. Softw., vol. 195, p. 111537, 2023.
  9. R. van Dinter, C. Catal, G. Giray, and B. Tekinerdogan, "Just-in-time defect prediction for mobile applications: Using shallow or deep learning?" Softw. Qual. J., vol. 31, no. 4, pp. 1281–1302, 2023.
  10. Z. M. Zain, S. Sakri, and N. H. A. Ismail, "Application of deep learning in software defect prediction: Systematic literature review and meta-analysis," Inf. Softw. Technol., vol. 158, p. 107175, 2023.
  11. M. M. Jouybari, A. Tajary, M. Fateh, and V. Abolghasemi, "A novel deep neural network structure for software fault prediction," PeerJ Comput—Sci., vol. 10, p. e2270, 2024.
  12. C. M. Liapis, A. Karanikola, and S. Kotsiantis, "Data-efficient software defect prediction: A comparative analysis of active learning-enhanced models and voting ensembles," Inf. Sci., vol. 676, p. 120786, 2024.
  13. H. Alsawalqah, N. Hijazi, M. Eshtay, H. Faris, A. A. Radaideh, I. Aljarah, and Y. Alshamaileh, "Software defect prediction using heterogeneous ensemble classification based on segmented patterns," Appl: Sci., vol. 10, no. 5, p. 1745, 2020.
  14. C. Liu, D. Yang, X. Xia, M. Yan, and X. Zhang, "A two-phase transfer learning model for cross-project defect prediction," Inf. Softw. Technol., vol. 107, pp. 125–136, 2019.
  15. A. Alazba and H. Aljamaan, "Software defect prediction using stacking generalisation of optimised tree-based ensembles," Appl. Sci., vol. 12, no. 9, p. 4577, 2022.
  16. A. N. Babatunde, R. O. Ogundokun, L. B. Adeoye, and S. Misra, "Software defect prediction using DAGging meta-learner-based classifiers," Mathematics, vol. 11, no. 12, p. 2714, 2023.
  17. A. Daza, "Software defect prediction based on a multiclassifier with hyperparameters: Future work," Results Eng., vol. 25, p. 104123, 2025.
  18. K. K. Bejjanki, J. Gyani, and N. Gugulothu, "Class imbalance reduction (CIR): A novel approach to software defect prediction in the presence of class imbalance," Symmetry, vol. 12, no. 3, p. 407, 2020.
  19. S. Feng, J. Keung, X. Yu, Y. Xiao, and M. Zhang, "Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction," Inf. Softw. Technol., vol. 139, p. 106662, 2021.
  20. S. Goyal, "Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction," Artif. Intell. Rev., vol. 55, no. 3, pp. 2023–2064, 2022.
  21. S. R. Goyal, "A systematic review on AI-based class imbalance handling in software defect prediction," Results Eng., p. 106578, 2025.
  22. Z. Xu et al., "LDFR: Learning deep feature representation for software defect prediction," J. Syst. Softw., vol. 158, p. 110402, 2019.
  23. R. Malhotra and S. Priya, "DHG-BiGRU: Dual-attention based hierarchical gated recurrent unit model for software defect prediction," Inf. Softw. Technol., vol. 178, p. 107600, 2025.
  24. M. Nashaat and J. Miller, "Refining software defect prediction through attentive neural models for code understanding," J. Syst. Softw., vol. 220, p. 112266, 2025.
  25. Y. Tang, Y. Zhou, C. Yang, Y. Du, and M. S. Yang, "An instance gravity oversampling method for software defect prediction," Inf. Softw. Technol., vol. 179, p. 107657, 2025.
  26. B. Arasteh, K. Arasteh, A. Ghaffari, and R. Ghanbarzadeh, "A new binary chaos-based metaheuristic algorithm for software defect prediction," Cluster Comput., vol. 27, no. 7, pp. 10093–10123, 2024.
  27. F. Yang, G. Zeng, F. Zhong, P. Xiao, W. Zheng, and F. Qiu, "CfExplainer: Explainable just-in-time defect prediction based on counterfactuals," J. Syst. Softw., vol. 218, p. 112182, 2024.

Software defects are a critical challenge in modern software engineering systems due to the growing complexity of the structure and interdependence of objects in object-oriented system designs, which negatively affect the maintenance costs and reliability of the system. This paper introduces a machine learning-based system that predicts software defects using structural software metrics. A structured dataset of 145 software modules of 94 numerical features was analysed using six major metrics: Coupling Between Objects (CBO), Depth of Inheritance Tree (DIT), Lack of Cohesion of Methods (LCOM), Number of Children (NOC), Response for Class (RFC) and Weighted Methods per Class (WMC). The statistical analysis indicated substantial variation and skewness in complexity-related measures, particularly CBO and RFC, suggesting the presence of structural outliers. Correlation analysis revealed a significant association among coupling, cohesion, and response measures, indicating that defect proneness is not affected by single factors but rather by interacting structural properties. The dataset had a moderate class imbalance, with most modules being non-defective. To compare the predictive performance, three supervised machine learning models (Decision Tree, Random Forest and Logistic Regression) were trained using a stratified 70:30 train-test split. The Random Forest model achieved the highest overall performance with an accuracy of 72.73%, precision of 75%, recall of 50%, and an F1-score of 0.60, reflecting a good balance between classification accuracy and generalisation. Logistic Regression was more precise but had less recall, whereas the Decision Tree model was less accurate but more interpretable. In general, the results show that statistical analysis and machine learning provide an efficient approach to early defect detection. The paper highlights the importance of structural complexity measures in identifying defect-sensitive modules. It advocates using an ensemble learning algorithm to enhance software quality assurance and overall system reliability.

Keywords : Software Defect Prediction, Machine Learning, Random Forest, Decision Tree, Logistic Regression, Software Metrics, Classification.

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe