Enhancing crossproject defect prediction using stacking ensemble and hybrid sampling| International Journal of Innovative Science and Research Technology

Enhancing Cross-Project Defect Prediction Using Stacking Ensemble and Hybrid Sampling

Authors : Aliyu Bashir Nuhu; Dr. Yusuf Salisu Ibrahim

Volume/Issue : Volume 11 - 2026, Issue 5 - May

Google Scholar : https://tinyurl.com/ynaf9yt3

Scribd : https://tinyurl.com/5c4fa4by

DOI : https://doi.org/10.38124/ijisrt/26May202

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Software Defect Prediction known as SDP improves software quality by identifying defect-prone modules early in development, however, Cross-Project Defect Prediction (CPDP) remains challenging due to data heterogeneity across projects and severe class imbalance in defect datasets. Conventional machine learning models fail in effective generalization of new projects and demonstrate poor minority class detection in projects. This study aimed to develop and evaluate a Hybrid Ensemble Deep Learning framework to improve Cross Project Defect Prediction and performance under heterogeneous and imbalanced conditions. The framework combines Random Forest and XGBoost as base learners in a stacking generalization architecture, after which a Deep Neural Network serves as the meta-classifier. To address class imbalance and boundary noise, a SMOTE-Tomek hybrid sampling technique was joined into the preprocessing pipeline. The model was evaluated using a Leave-One-Project-Out (LOPO) validation approach on 5 different PROMISE datasets (CM1, JM1, KC1, MW1, and PC1).

Keywords : Hybrid Ensemble Deep Learning; Cross-Project Defect Prediction; Federated Meta Learning; Software Defect Prediction; Ensemble Learning; Synthetic Minority Over-Sampling Technique; Machine Learning.

References :

Akhtar, S. (2025). Software testing evolution: Comparative insights into traditional and emerging practices. ICCK Journal of Software Engineering, 1(1), 46–62. https://doi.org/10.62762/JSE.2025.246843
Albattah, W., & Alzahrani, M. (2024). Software defect prediction based on machine learning and deep learning techniques: An empirical approach. AI, 5(4), 1743–1758. https://doi.org/10.3390/ai5040086
Arai, K. (Ed.). (2025). Intelligent computing: Proceedings of the 2025 Computing Conference (Vol. 2). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-92605-1
Bakar, N. S. A. A. (2024). Machine learning implementation in automated software testing: A review. Journal of Data Analytics and Artificial Intelligence Applications, 1(1), 110–122.
Bennin, K. E., Tahir, A., MacDonell, S. G., & Börstler, J. (2022). An empirical study on the effectiveness of data resampling approaches for cross-project software defect prediction. IET Software, 16(2), 185–199.
Cerqueira, M., Silva, P., & Fernandes, S. (2022). Systematic literature review on the machine learning approach in software engineering. American Academic Scientific Research Journal for Engineering, Technology, and Sciences, 85(1), 370–396.
Chen, C., & Chen, J. (2025, April). An industrial application software testing framework using explanatory intelligence based on task logic. In 2025 6th International Conference on Computer Engineering and Application (ICCEA) (pp. 1010–1013). IEEE.
Chen, H., Yang, L., & Wang, A. (2024). Efficient cross-project software defect prediction based on federated meta-learning. Electronics, 13(6), 1105.
Chen, Y., Hu, Z., Zhi, C., Han, J., Deng, S., & Yin, J. (2024). ChatUniTest: A framework for LLM-based test generation. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE 2024) (pp. 572–576). Association for Computing Machinery. https://doi.org/10.1145/3663529.3663801
Harsh, H. S., & Singh, P. (2025). Comparative study of machine learning based defect prediction models for Python software. In 2025 6th International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 1867–1872). IEEE. https://doi.org/10.1109/ICIRCA65293.2025.11089647
Jayasekera, C. M. (2025). Enhancing software test automation tools through machine learning and AI strategies: A case study of the IT industry in Sri Lanka. http://www.diva-portal.org/smash/get/diva2:1976484/FULLTEXT01.pdf
Jindal, R., Ahmad, A., & Aditya, A. (2022). Ensemble based cross-project defect prediction. In P. Karuppusamy, I. Perikos, & F. P. García Márquez (Eds.), Ubiquitous intelligent systems (Smart Innovation, Systems and Technologies, Vol. 243). Springer. https://doi.org/10.1007/978-981-16-3675-2_47
Kang, H., & Do, S. (2024). ML-based software defect prediction in embedded software for telecommunication systems (focusing on the case of Samsung Electronics). Electronics, 13(9), 1690. https://doi.org/10.3390/electronics13091690
Kesavan, E. (2025). The future of software testing: A review of trends, challenges, and opportunities. International Journal of Innovations in Science, Engineering and Management, 4(2), 53–58. https://doi.org/10.69968/ijisem.2025v4i253-57
Khan, M. A., Azim, A., Liscano, R., Smith, K., Chang, Y. K., Seferi, G., & Tauseef, Q. (2025, March). ML-based test case prioritization: A research and production perspective in CI environments. In 2025 IEEE Conference on Software Testing, Verification and Validation. IEEE.
Kumar, H., & Saxena, V. (2024). Software defect prediction using hybrid machine learning techniques: A comparative study. Journal of Software Engineering and Applications, 17(4), 155–171. https://doi.org/10.4236/jsea.2024.174009
Li, Z., Niu, J., & Jing, X. Y. (2024). Software defect prediction: Future directions and challenges. Automated Software Engineering, 31(1), 19.
Mehta, A., Kaur, A., & Kaur, N. (2025). Impact of class imbalance on software fault prediction: Investigation and analysis. In N. K. Marriwala, V. K. Shukla, S. Jain, D. Kumar, & S. Dhingra (Eds.), Mobile radio communications and 5G networks (Lecture Notes in Networks and Systems, Vol. 1328). Springer. https://doi.org/10.1007/978-981-96-4226-7_19
Medium. (n.d.). Why are manual testing tools still relevant in 2025? (And which ones to opt for). https://medium.com/@david-auerbach/why-are-manual-testing-tools-still-relevant-in-2025-and-which-ones-to-opt-for-1b5360dca5e4
Nassif, A. B., Talib, M. A., Azzeh, M., Alzaabi, S., Khanfar, R., Kharsa, R., & Angelis, L. (2023). Software defect prediction using learning to rank approach. Scientific Reports, 13(1), 18885. https://doi.org/10.1038/s41598-023-45915-5
Nyaga, F. (2025). AI-driven software engineering: A systematic review of machine learning’s impact and future directions. Preprints. https://doi.org/10.20944/preprints202504.0174.v1
Owen, A., & Maxwell, P. (2025). Towards fully autonomous testing: Combining machine learning, reinforcement learning, and AI planning in QA. [Complete publication details needed.]
Qiu, S., E, B., & He, J. (2025). Features extraction and fusion by attention mechanism for software defect prediction. PLOS ONE, 20(4), e0320808. https://doi.org/10.1371/journal.pone.0320808
Rahman, S. M. M., & Eisty, N. U. (2025, May). Introducing ensemble machine learning algorithms for automatic test case generation using learning based testing. In 2025 IEEE/ACIS 23rd International Conference on Software Engineering Research, Management and Applications (SERA) (pp. 118–125). IEEE.
Saeed, M. S., & Saleem, M. (2023). Cross-project software defect prediction using machine learning: A review. International Journal of Computational and Innovative Sciences, 2(3), 35–52.
Savvycom. (n.d.). Top 8 reasons why software testing is important in 2025. https://savvycomsoftware.com/blog/why-software-testing-is-important
Wang, A., Feng, Y., Yang, M., Wu, H., Iwahori, Y., & Chen, H. (2024). Cross-project software defect prediction using differential perception combined with inheritance federated learning. Electronics, 13(24), 4893.
Wójcicki, B., & Dabrowski, R. (2018). Applying machine learning to software fault prediction. e-Informatica Software Engineering Journal, 12(1).

Software Defect Prediction known as SDP improves software quality by identifying defect-prone modules early in development, however, Cross-Project Defect Prediction (CPDP) remains challenging due to data heterogeneity across projects and severe class imbalance in defect datasets. Conventional machine learning models fail in effective generalization of new projects and demonstrate poor minority class detection in projects. This study aimed to develop and evaluate a Hybrid Ensemble Deep Learning framework to improve Cross Project Defect Prediction and performance under heterogeneous and imbalanced conditions. The framework combines Random Forest and XGBoost as base learners in a stacking generalization architecture, after which a Deep Neural Network serves as the meta-classifier. To address class imbalance and boundary noise, a SMOTE-Tomek hybrid sampling technique was joined into the preprocessing pipeline. The model was evaluated using a Leave-One-Project-Out (LOPO) validation approach on 5 different PROMISE datasets (CM1, JM1, KC1, MW1, and PC1).

Paper Submission Last Date
31 - July - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.