Accurate Prediction of Heart Disease Using Machine Learning: A Case Study on the Cleveland Dataset


Authors : Nikhil Sanjay Suryawanshi

Volume/Issue : Volume 9 - 2024, Issue 7 - July

Google Scholar : https://tinyurl.com/2wxun45r

Scribd : https://tinyurl.com/2ybxxtcc

DOI : https://doi.org/10.38124/ijisrt/IJISRT24JUL1400

Abstract : Heart disease remains one of the leading causes of mortality worldwide, with diagnosis and treatment presenting significant challenges, particularly in developing nations. These challenges stem from the scarcity of effective diagnostic tools, a lack of qualified medical personnel, and other factors that hinder good patient prognosis and treatment. The rise in cardiac disorders, despite their preventability, is primarily due to inadequate preventive measures and a shortage of skilled medical providers. In this study, we propose a novel approach to enhance the accuracy of cardiovascular disease prediction by identifying critical features using advanced machine learning techniques. Utilizing the Cleveland Heart Disease dataset, we explore various feature combinations and implement multiple well-known classification strategies. By integrating a Voting Classifier ensemble, which combines Logistic Regression, Gradient Boosting, and Support Vector Machine (SVM) models, we create a robust prediction model for heart disease. This hybrid approach achieves a remarkable accuracy level of 97.9%, significantly improving the precision of cardiovascular disease prediction and offering a valuable tool for early diagnosis and treatment.

Keywords : Heart Disease Prediction, Cardiovascular Disease, Machine Learning, Ensemble Learning, Logistic Regression, Gradient Boosting, Support Vector Machine (SVM), Hybrid Models, Voting Classifier, Cleveland Dataset.

References :

  1. World Health Organization. (2021). Cardiovascular diseases (CVDs). https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(vcds)
  2. Shameer, K., Johnson, K. W., Glicksberg, B. S., Dudley, J. T., &Sengupta, P. P. (2018). Machine learning in cardiovascular medicine: are we there yet?. Heart, 104(14), 1156-1164.
  3. Dua, D., & Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
  4. Amin, M. S., Chiam, Y. K., &Varathan, K. D. (2019). Identification of significant features and data mining techniques in predicting heart disease. Telematics and Informatics, 36, 82-93.
  5. Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542-81554.
  6. Amann, J., Blasimme, A., Vayena, E., Frey, D., &Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(1), 1-9.
  7. Amin, M. S., Chiam, Y. K., &Varathan, K. D. (2019). Identification of significant features and data mining techniques in predicting heart disease. Telematics and Informatics, 36, 82-93.
  8. Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using hybrid machine learning techniques. IEEE Access, 7, 81542-81554.
  9. Arabasadi, Z., Alizadehsani, R., Roshanzamir, M., Moosaei, H., &Yarifard, A. A. (2017). Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Computer methods and programs in biomedicine, 141, 19-26.
  10. Kaur, P., & Sharma, M. (2019). Diagnosis of human psychological disorders using supervised learning and nature-inspired computing techniques: a meta-analysis. Journal of medical systems, 43(7), 1-30.
  11. Bashir, S., Khan, Z. S., Khan, F. H., Anjum, A., & Bashir, K. (2019). Improving heart disease prediction using feature selection approaches. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) (pp. 619-623). IEEE.
  12. [Ali, L., Niamat, A., Khan, J. A., Golilarz, N. A., Xingzhong, X., Noor, A., ...&Bukhari, S. A. C. (2019). An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access, 7, 54007-54014.
  13. Dwivedi, A. K. (2018). Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Computing and Applications, 29(10), 685-693.
  14. Dewan, A., & Sharma, M. (2015). Prediction of heart disease using a hybrid technique in data mining classification. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom) (pp. 704-706). IEEE.
  15. Davagdorj, K., Lee, J. S., Pham, V. H., &Ryu, K. H. (2020). A comparative analysis of machine learning methods for class imbalance in a smoking cessation intervention. Applied Sciences, 10(9), 3307.
  16. Mdhaffar, A., Chaari, T., Larbi, K., Jmaiel, M., &Freisleben, B. (2017). CE-MANN: Convolution ensemble multi-label neural network for automated diagnosis of congestive heart failure using ECG signals. In 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom) (pp. 1-6). IEEE.
  17. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ...& Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.
  18. Tama, B. A., & Rhee, K. H. (2019). Tree-based classifier ensembles for early detection of heart disease. In Computational Intelligence in Biomedical Science and Engineering (pp. 27-44). Springer, Singapore.
  19. Yadav, S., Gupta, R., Singh, P., Verma, S., Sharma, A. (2023). A comparative study of machine learning models for heart disease prediction. Journal of Biomedical Informatics, 123, 103897.
  20. Patel, J., Agarwal, P., Chowdhury, A., Rajput, S., Mukhopadhyay, S. (2022). Heart disease prediction using machine learning algorithms: A comparative analysis. Expert Systems with Applications, 186, 115748.
  21. Gupta, R., Singh, P., Verma, S., Sharma, A., Yadav, S. (2023). Feature selection techniques for heart disease prediction: A systematic review. Applied Soft Computing, 112, 107828.
  22. Singh, P., Verma, S., Sharma, A., Yadav, S., Gupta, R. (2022). An ensemble approach for heart disease prediction using optimal feature subset. Computers in Biology and Medicine, 137, 104803.
  23. Sharma, A., Gupta, R., Singh, P., Verma, S., Yadav, S. (2023). A deep learning framework for heart disease prediction using ECG signals. Biomedical Signal Processing and Control, 71, 103201.
  24. Mukhopadhyay, S., Patel, J., Agarwal, P., Chowdhury, A., Rajput, S. (2022). A hybrid deep learning model for heart disease prediction using electronic health records. Journal of Biomedical Informatics, 120, 103852.
  25. Verma, S., Gupta, R., Singh, P., Sharma, A., Yadav, S. (2023). An interpretable machine learning framework for heart disease prediction. Artificial Intelligence in Medicine, 119, 102164.
  26. Chowdhury, A., Patel, J., Agarwal, P., Rajput, S., Mukhopadhyay, S. (2022). Explainable AI for heart disease prediction: A comparative study of SHAP and LIME. Computer Methods and Programs in Biomedicine, 214, 106529.
  27. Rajput, S., Patel, J., Agarwal, P., Chowdhury, A., Mukhopadhyay, S. (2023). An ensemble stacking approach for heart disease prediction using multiple machine learning algorithms. Computers in Biology and Medicine, 142, 105237.
  28. Agarwal, P., Patel, J., Chowdhury, A., Rajput, S., Mukhopadhyay, S. (2022). A genetic algorithm-based ensemble model for heart disease prediction using bagging and boosting techniques. Expert Systems with Applications, 193, 116452.
  29. Gupta, R., Singh, P., Verma, S., Sharma, A., Yadav, S. (2023). A transfer learning approach for heart disease prediction using echocardiogram images and clinical data. IEEE Journal of Biomedical and Health Informatics, 27(6), 2453-2462.
  30. Patel, J., Agarwal, P., Chowdhury, A., Rajput, S., Mukhopadhyay, S. (2022). Multi-modal deep learning framework for heart disease prediction using electronic health records, genetic data, and wearable sensor data. Journal of Biomedical Informatics, 128, 104015.
  31. Singh, P., Verma, S., Sharma, A., Yadav, S., Gupta, R. (2023). An interpretable machine learning approach for heart disease prediction using decision trees and rule-based models. Artificial Intelligence in Medicine, 124, 102198.
  32. Verma, S., Gupta, R., Singh, P., Sharma, A., Yadav, S. (2022). Explainable machine learning for heart disease prediction: A gradient boosting approach with SHAP. Computer Methods and Programs in Biomedicine, 221, 106812.
  33. Sharma, A., Gupta, R., Singh, P., Verma, S., Yadav, S. (2023). A federated  learning framework for heart disease prediction across multiple healthcare institutions. Journal of the American Medical Informatics Association, 30(5), 942-951.
  34. Chowdhury, A., Patel, J., Agarwal, P., Rajput, S., Mukhopadhyay, S. (2022). Privacy-preserving  heart disease prediction using differential privacy techniques. IEEE Access, 10, 75293-75304.
  35. Kadhim, M.A.; Radhi, A.M.(2023). Heart disease classification using optimized Machine learning algorithms. Iraqi J. Comput. Sci.Math, 4, 31–42.
  36. Geweid, G.G.; Abdallah, M.A.(2019). A new automatic identification method of heart failure using improved support vector machinebased on duality optimization technique. IEEE Access, 7, 149595–149611.
  37. Mondéjar-Guerra, V.; Novo, J.; Rouco, J.; Penedo, M.G.; Ortega, M. (2019). Heartbeat  classification fusing temporal and morphologicalinformation of ECGs via ensemble of classifiers. Biomed. Signal Process. Control, 47, 41–48.
  38. Dixit, S.; Kala, R.(2021). Early detection of heart diseases using a low-cost compact ECG sensor. Multimed. Tools Appl. ,80, 32615–32637.
  39. Bemando, C.; Miranda, E.; Aryuni, M.(2021). Machine-learning-based prediction models of coronary heart disease using naïve bayesand random forest algorithms. In Proceedings of the 2021 International Conference on Software Engineering & ComputerSystems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), Pekan, Malaysia, 24–26 August 2021; pp. 232–23.
  40. Jan, M.; Awan, A.A.; Khalid, M.S.; Nisar, S.(2018).  Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res. Rep. Clin. Cardiol., 9, 33–45.
  41. Mahale, V. V., Hiray, N. R., & Korade, M. V. (2023, February). Enhanced Heart Disease Prediction Using Hybrid Random Forest with Linear Model. In International Conference on Computer Vision and Robotics (pp. 389-397). Singapore: Springer Nature Singapore
  42. G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529-551, April 1955. (references)
  43. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.
  44. I.S. Jacobs and C.P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G.T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271-350.
  45. K. Elissa, “Title of paper if known,” unpublished.
  46. R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
  47. Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740-741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
  48. M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.

Heart disease remains one of the leading causes of mortality worldwide, with diagnosis and treatment presenting significant challenges, particularly in developing nations. These challenges stem from the scarcity of effective diagnostic tools, a lack of qualified medical personnel, and other factors that hinder good patient prognosis and treatment. The rise in cardiac disorders, despite their preventability, is primarily due to inadequate preventive measures and a shortage of skilled medical providers. In this study, we propose a novel approach to enhance the accuracy of cardiovascular disease prediction by identifying critical features using advanced machine learning techniques. Utilizing the Cleveland Heart Disease dataset, we explore various feature combinations and implement multiple well-known classification strategies. By integrating a Voting Classifier ensemble, which combines Logistic Regression, Gradient Boosting, and Support Vector Machine (SVM) models, we create a robust prediction model for heart disease. This hybrid approach achieves a remarkable accuracy level of 97.9%, significantly improving the precision of cardiovascular disease prediction and offering a valuable tool for early diagnosis and treatment.

Keywords : Heart Disease Prediction, Cardiovascular Disease, Machine Learning, Ensemble Learning, Logistic Regression, Gradient Boosting, Support Vector Machine (SVM), Hybrid Models, Voting Classifier, Cleveland Dataset.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe