Authors :
Aliyu Bashir Nuhu; Dr. Yusuf Salisu Ibrahim
Volume/Issue :
Volume 11 - 2026, Issue 5 - May
Google Scholar :
https://tinyurl.com/ynaf9yt3
Scribd :
https://tinyurl.com/5c4fa4by
DOI :
https://doi.org/10.38124/ijisrt/26May202
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Software Defect Prediction known as SDP improves software quality by identifying defect-prone modules early in
development, however, Cross-Project Defect Prediction (CPDP) remains challenging due to data heterogeneity across
projects and severe class imbalance in defect datasets. Conventional machine learning models fail in effective generalization
of new projects and demonstrate poor minority class detection in projects. This study aimed to develop and evaluate a
Hybrid Ensemble Deep Learning framework to improve Cross Project Defect Prediction and performance under
heterogeneous and imbalanced conditions. The framework combines Random Forest and XGBoost as base learners in a
stacking generalization architecture, after which a Deep Neural Network serves as the meta-classifier. To address class
imbalance and boundary noise, a SMOTE-Tomek hybrid sampling technique was joined into the preprocessing pipeline.
The model was evaluated using a Leave-One-Project-Out (LOPO) validation approach on 5 different PROMISE datasets
(CM1, JM1, KC1, MW1, and PC1).
Keywords :
Hybrid Ensemble Deep Learning; Cross-Project Defect Prediction; Federated Meta Learning; Software Defect Prediction; Ensemble Learning; Synthetic Minority Over-Sampling Technique; Machine Learning.
References :
- Akhtar, S. (2025). Software testing evolution: Comparative insights into traditional and emerging practices. ICCK Journal of Software Engineering, 1(1), 46–62. https://doi.org/10.62762/JSE.2025.246843
- Albattah, W., & Alzahrani, M. (2024). Software defect prediction based on machine learning and deep learning techniques: An empirical approach. AI, 5(4), 1743–1758. https://doi.org/10.3390/ai5040086
- Arai, K. (Ed.). (2025). Intelligent computing: Proceedings of the 2025 Computing Conference (Vol. 2). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-92605-1
- Bakar, N. S. A. A. (2024). Machine learning implementation in automated software testing: A review. Journal of Data Analytics and Artificial Intelligence Applications, 1(1), 110–122.
- Bennin, K. E., Tahir, A., MacDonell, S. G., & Börstler, J. (2022). An empirical study on the effectiveness of data resampling approaches for cross-project software defect prediction. IET Software, 16(2), 185–199.
- Cerqueira, M., Silva, P., & Fernandes, S. (2022). Systematic literature review on the machine learning approach in software engineering. American Academic Scientific Research Journal for Engineering, Technology, and Sciences, 85(1), 370–396.
- Chen, C., & Chen, J. (2025, April). An industrial application software testing framework using explanatory intelligence based on task logic. In 2025 6th International Conference on Computer Engineering and Application (ICCEA) (pp. 1010–1013). IEEE.
- Chen, H., Yang, L., & Wang, A. (2024). Efficient cross-project software defect prediction based on federated meta-learning. Electronics, 13(6), 1105.
- Chen, Y., Hu, Z., Zhi, C., Han, J., Deng, S., & Yin, J. (2024). ChatUniTest: A framework for LLM-based test generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE 2024) (pp. 572–576). Association for Computing Machinery. https://doi.org/10.1145/3663529.3663801
- Harsh, H. S., & Singh, P. (2025). Comparative study of machine learning based defect prediction models for Python software. In 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 1867–1872). IEEE. https://doi.org/10.1109/ICIRCA65293.2025.11089647
- Jayasekera, C. M. (2025). Enhancing software test automation tools through machine learning and AI strategies: A case study of the IT industry in Sri Lanka. http://www.diva-portal.org/smash/get/diva2:1976484/FULLTEXT01.pdf
- Jindal, R., Ahmad, A., & Aditya, A. (2022). Ensemble based cross-project defect prediction. In P. Karuppusamy, I. Perikos, & F. P. García Márquez (Eds.), Ubiquitous intelligent systems (Smart Innovation, Systems and Technologies, Vol. 243). Springer. https://doi.org/10.1007/978-981-16-3675-2_47
- Kang, H., & Do, S. (2024). ML-based software defect prediction in embedded software for telecommunication systems (focusing on the case of Samsung Electronics). Electronics, 13(9), 1690. https://doi.org/10.3390/electronics13091690
- Kesavan, E. (2025). The future of software testing: A review of trends, challenges, and opportunities. International Journal of Innovations in Science, Engineering and Management, 4(2), 53–58. https://doi.org/10.69968/ijisem.2025v4i253-57
- Khan, M. A., Azim, A., Liscano, R., Smith, K., Chang, Y. K., Seferi, G., & Tauseef, Q. (2025, March). ML-based test case prioritization: A research and production perspective in CI environments. In 2025 IEEE Conference on Software Testing, Verification and Validation. IEEE.
- Kumar, H., & Saxena, V. (2024). Software defect prediction using hybrid machine learning techniques: A comparative study. Journal of Software Engineering and Applications, 17(4), 155–171. https://doi.org/10.4236/jsea.2024.174009
- Li, Z., Niu, J., & Jing, X. Y. (2024). Software defect prediction: Future directions and challenges. Automated Software Engineering, 31(1), 19.
- Mehta, A., Kaur, A., & Kaur, N. (2025). Impact of class imbalance on software fault prediction: Investigation and analysis. In N. K. Marriwala, V. K. Shukla, S. Jain, D. Kumar, & S. Dhingra (Eds.), Mobile radio communications and 5G networks (Lecture Notes in Networks and Systems, Vol. 1328). Springer. https://doi.org/10.1007/978-981-96-4226-7_19
- Medium. (n.d.). Why are manual testing tools still relevant in 2025? (And which ones to opt for). https://medium.com/@david-auerbach/why-are-manual-testing-tools-still-relevant-in-2025-and-which-ones-to-opt-for-1b5360dca5e4
- Nassif, A. B., Talib, M. A., Azzeh, M., Alzaabi, S., Khanfar, R., Kharsa, R., & Angelis, L. (2023). Software defect prediction using learning to rank approach. Scientific Reports, 13(1), 18885. https://doi.org/10.1038/s41598-023-45915-5
- Nyaga, F. (2025). AI-driven software engineering: A systematic review of machine learning’s impact and future directions. Preprints. https://doi.org/10.20944/preprints202504.0174.v1
- Owen, A., & Maxwell, P. (2025). Towards fully autonomous testing: Combining machine learning, reinforcement learning, and AI planning in QA. [Complete publication details needed.]
- Qiu, S., E, B., & He, J. (2025). Features extraction and fusion by attention mechanism for software defect prediction. PLOS ONE, 20(4), e0320808. https://doi.org/10.1371/journal.pone.0320808
- Rahman, S. M. M., & Eisty, N. U. (2025, May). Introducing ensemble machine learning algorithms for automatic test case generation using learning based testing. In 2025 IEEE/ACIS 23rd International Conference on Software Engineering Research, Management and Applications (SERA) (pp. 118–125). IEEE.
- Saeed, M. S., & Saleem, M. (2023). Cross-project software defect prediction using machine learning: A review. International Journal of Computational and Innovative Sciences, 2(3), 35–52.
- Savvycom. (n.d.). Top 8 reasons why software testing is important in 2025. https://savvycomsoftware.com/blog/why-software-testing-is-important
- Wang, A., Feng, Y., Yang, M., Wu, H., Iwahori, Y., & Chen, H. (2024). Cross-project software defect prediction using differential perception combined with inheritance federated learning. Electronics, 13(24), 4893.
- Wójcicki, B., & Dabrowski, R. (2018). Applying machine learning to software fault prediction. e-Informatica Software Engineering Journal, 12(1).
Software Defect Prediction known as SDP improves software quality by identifying defect-prone modules early in
development, however, Cross-Project Defect Prediction (CPDP) remains challenging due to data heterogeneity across
projects and severe class imbalance in defect datasets. Conventional machine learning models fail in effective generalization
of new projects and demonstrate poor minority class detection in projects. This study aimed to develop and evaluate a
Hybrid Ensemble Deep Learning framework to improve Cross Project Defect Prediction and performance under
heterogeneous and imbalanced conditions. The framework combines Random Forest and XGBoost as base learners in a
stacking generalization architecture, after which a Deep Neural Network serves as the meta-classifier. To address class
imbalance and boundary noise, a SMOTE-Tomek hybrid sampling technique was joined into the preprocessing pipeline.
The model was evaluated using a Leave-One-Project-Out (LOPO) validation approach on 5 different PROMISE datasets
(CM1, JM1, KC1, MW1, and PC1).
Keywords :
Hybrid Ensemble Deep Learning; Cross-Project Defect Prediction; Federated Meta Learning; Software Defect Prediction; Ensemble Learning; Synthetic Minority Over-Sampling Technique; Machine Learning.