Advanced Investigation of Healthcare Fraud Detection Utilizing Machine Learning Algorithms


Authors : Saeed H. A.; Naeem N. Alyona A. E.

Volume/Issue : Volume 10 - 2025, Issue 2 - February


Google Scholar : https://tinyurl.com/59r4zsvk

DOI : https://doi.org/10.38124/ijisrt/25Feb1337

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Healthcare fraud is a fast-growing issue that causes substantial financial loss and affects the quality of patient care. Conventional fraud detection techniques tend to be ineffective in detecting fraudulent claims because healthcare data is complex and enormous in volume. This research investigates the use of machine learning methods to enhance fraud detection within healthcare systems. We contrast the performance of Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Logistic Regression both prior to and post-hyperparameter tuning and feature selection. Forward feature selection was done with KNN and Logistic Regression to improve model performance by choosing the most salient features, whereas hyperparameter tuning was utilized to fine-tune all the models. Metrics of evaluation like accuracy, precision, recall, F1-score, confusion matrix, and ROC curves were employed to measure the effectiveness of the models. The outcome reveals that Logistic Regression had the highest accuracy following optimization and feature selection over other models in identifying fraudulent claims. The Voting Classifier, which is an ensemble learning, enhanced fraud detection by aggregating various models for enhanced predictive capability. Though Decision Tree and Random Forest performed well, tuning was not effective in improving their accuracy. These results indicate that machine learning methods, especially ensemble models and feature selection, can dramatically improve healthcare fraud detection. Subsequent studies need to integrate deep learning and advanced ensemble techniques to further enhance fraud detection accuracy and reduce false positives.

Keywords : Healthcare Fraud, Machine Learning, Fraud Detection, Hyperparameter Tuning, Feature Selection, Ensemble Learning.

References :

  1. T. Fernando, H. Gammulle, S. Denman, S. Sridharan, and C. Fookes, “Deep learning for medical anomaly detection–a survey,” ACM Computing Surveys (CSUR), vol. 54, no. 7, pp. 1–37, 2021.
  2. J. M. Johnson and T. M. Khoshgoftaar, “Medicare fraud detection using neural networks,” J Big Data, vol. 6, no. 1, p. 63, 2019.
  3. L. Syed, S. Jabeen, S. Manimala, and H. A. Elsayed, “Data science algorithms and techniques for smart healthcare using IoT and big data analytics,” Smart techniques for a smarter planet: towards smarter algorithms, pp. 211–241, 2019.
  4. A. B. Nassif, M. A. Talib, Q. Nasir, and F. M. Dakalbab, “Machine learning for anomaly detection: A systematic review,” Ieee Access, vol. 9, pp. 78658–78700, 2021.
  5. T. Dissanayake, T. Fernando, S. Denman, S. Sridharan, H. Ghaemmaghami, and C. Fookes, “A robust interpretable deep learning classifier for heart anomaly detection without segmentation,” IEEE J Biomed Health Inform, vol. 25, no. 6, pp. 2162–2171, 2020.
  6. D. Ververidis and C. Kotropoulos, “Sequential forward feature selection with low computational cost,” in 2005 13th European Signal Processing Conference, IEEE, 2005, pp. 1–4.
  7. L. Li et al., “A system for massively parallel hyperparameter tuning,” Proceedings of Machine Learning and Systems, vol. 2, pp. 230–246, 2020.
  8. L. Connelly, “Logistic regression.,” Medsurg Nursing, vol. 29, no. 5, 2020.
  9. A. Y. B. R. Thaifur, M. A. Maidin, A. I. Sidin, and A. Razak, “How to detect healthcare fraud?‘A systematic review,’” Gac Sanit, vol. 35, pp. S441–S449, 2021.
  10. B. Mahesh, “Machine learning algorithms-a review,” International Journal of Science and Research (IJSR).[Internet], vol. 9, no. 1, pp. 381–386, 2020.
  11. Z. Zhang, “Introduction to machine learning: k-nearest neighbors,” Ann Transl Med, vol. 4, no. 11, 2016.

Healthcare fraud is a fast-growing issue that causes substantial financial loss and affects the quality of patient care. Conventional fraud detection techniques tend to be ineffective in detecting fraudulent claims because healthcare data is complex and enormous in volume. This research investigates the use of machine learning methods to enhance fraud detection within healthcare systems. We contrast the performance of Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Logistic Regression both prior to and post-hyperparameter tuning and feature selection. Forward feature selection was done with KNN and Logistic Regression to improve model performance by choosing the most salient features, whereas hyperparameter tuning was utilized to fine-tune all the models. Metrics of evaluation like accuracy, precision, recall, F1-score, confusion matrix, and ROC curves were employed to measure the effectiveness of the models. The outcome reveals that Logistic Regression had the highest accuracy following optimization and feature selection over other models in identifying fraudulent claims. The Voting Classifier, which is an ensemble learning, enhanced fraud detection by aggregating various models for enhanced predictive capability. Though Decision Tree and Random Forest performed well, tuning was not effective in improving their accuracy. These results indicate that machine learning methods, especially ensemble models and feature selection, can dramatically improve healthcare fraud detection. Subsequent studies need to integrate deep learning and advanced ensemble techniques to further enhance fraud detection accuracy and reduce false positives.

Keywords : Healthcare Fraud, Machine Learning, Fraud Detection, Hyperparameter Tuning, Feature Selection, Ensemble Learning.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe