Authors :
Akinsola Adeniyi F.; Sokunbi. M. A.; Ogundele. I. O.; Onadokun I. O.
Volume/Issue :
Volume 11 - 2026, Issue 2 - February
Google Scholar :
https://tinyurl.com/2n2d39mh
Scribd :
https://tinyurl.com/2hvmxcjv
DOI :
https://doi.org/10.38124/ijisrt/26feb1439
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Stroke is one of the biggest challenges facing the world’s public health today and is ranked as the second most
common cause of death and the third most common cause of long term disability globally [1]. Early diagnosis is important
for lowering the risk of death as well as long-term disability from stroke. Traditional methods of diagnosing stroke, such as
CT scans and MRI's, however, can be expensive, time-consuming and require specialists to interpret them. The challenges
associated with these traditional methods of diagnosis can delay the making of decisions regarding treatment, especially in
low-resource settings in which there is a lack of access to advanced imaging technologies or qualified personnel. Due to this,
new approaches are now being studied in an effort to identify stroke rapidly and efficiently. The stroke detection model was
developed based on the machine learning application to structured data from patients. The four supervised learning methods
used were: LightGBM, CatBoost, XGBoost, and Random Forest. These methods were applied using a 70/30 split for the
training and testing set. Accuracy, Precision, Recall, F1-Score, and the Area Under Curve Receiver Operating Characteristic
Curve (AUC-ROC) were all used as measures of model performance. Of the models compared in the study, the Random
Forest method demonstrated superior model performance with an accuracy of 90% and an F1-Score of 0.95. Additionally,
the Random Forest model was able to achieve higher performance than gradient boosting methods for each of the most
important performance metrics. These results indicate that machine learning algorithms particularly those based on
ensemble techniques (such as Random Forest) can potentially be used to enhance current diagnostic pathways for predicting
stroke by providing faster, more scalable and more accessible predictions of stroke risk than are currently available, which
could provide the opportunity for earlier clinical interventions and better patient outcomes, specifically in low resource
health care settings. Machine learning tools should not be used to supplant clinical expertise; however, if integrated into
routine practice, they could mark an important step toward using data more effectively and equitably when caring for
patients with stroke.
Keywords :
Stroke Detection, Machine Learning, Random Forest, Predictive Modeling, Healthcare AI
References :
- Valery L Feigin , Michael Brainin , Bo Norrving , Sheila O Martins , Jeyaraj pandian, Patrice Lindsay, Maria F Grupper , Ilari Rautalin. World Stroke Organization, WSO Global Stroke Fact Sheet 2025. World Stroke Organization, 2025. DOI: 10.1177/17474930241308142
- GBD 2021 Stroke Collaborators, “Global, regional, and national burden of stroke and its risk factors, 1990–2021,” The Lancet Neurology, vol. 23, no. 4, pp. 345–367, 2024.
- Binbin Sui, Peiyi Gao, (2020), “Imaging evaluation of acute ischemic stroke,” Journal of International Medical Research, https://doi.org/10.1177/0300060518802530
- S. Dritsas and M. Trigka, (2022) “Stroke risk prediction with machine learning techniques,” Sensors, vol. 22, no.13, doi: 10.3390/s22134670
- Nojood Alageel, Rahaf Alharbi, Rehab Alharbi, Lubna A. Alharbi, Maryam Alsayil (2023) “Using Machine Learning Algorithm as a Method for Improving Stroke Prediction,” International Journal of Advanced Computer Science and Applications. DOI: 10.14569/IJACSA.2023.0140481
- Senjuti Rahman, Mehedi Hasan, Ajay Sarkar, “Prediction of Brain Stroke Using Machine Learning Algorithms and Deep Neural Network Techniques,” European Journal of Electrical Engineering and Computer Science, 7(1):23-30, 2023. DOI: 10.24018/ejece.2023.7.1.483
- Mandeep Kaur, Sachin R. Sakhare, Kirti Wanjale, Farzana Akter, (2022) “Early Stroke Prediction Methods for Prevention of Strokes,” Behavioural Neurology, https://doi.org/10.1155/2022/7725597.
- R. Pitchai, Bhasker Dappuri, P. V. Pramila, M. Vidhyalakshmi, S. Shanthi, Wadi B. Alonazi, Khalid M. A. Almutairi, R. S. Sundaram, Ibsa Beyene. (2023). “An Artificial Intelligence-Based Bio-Medical Stroke Prediction and Analytical System Using a Machine Learning Approach,” Computational Intelligence and Neuroscience, https://doi.org/10.1155/2022/5489084
- Nouf Saeed Alotaibi, Abdullah Shawan Alotaibi, M. Eliazer, Asadi Srinivasulu (2022), “Detection of Ischemic Stroke Tissue Fate from the MRI Images Using a Deep Learning Approach,” Mobile Information Systems, https://doi.org/10.1155/2022/9399876.
- Soumyabrata Dev, Hewei Wang, Chidozie Shamrock Nwosu, Nishtha Jain, Bharadwaj Veeravalli, Deepu John (2022), “A Predictive Analytics Approach for Stroke Prediction Using Machine Learning and Neural Networks,” Healthcare Analytics, https://doi.org/10.1016/j.health.2022.100032
- Eman M Alanazi , Aalaa Abdou , Jake Luo (2021), “Predicting Risk of Stroke from Lab Tests Using Machine Learning Algorithms: Development and Evaluation of Prediction Models,” JMIR Formative Research 5(12), DOI: 10.2196/23440.
- JoonNyung Heo , Jihoon G Yoon, Hyungjong Park, Young Dae Kim, Hyo Suk Nam, Ji Hoe Heo (2019), “Machine Learning–Based Model for Prediction of Outcomes in Acute Stroke,” Stroke, vol. 50, no. 5, doi: 10.1161/STROKEAHA.118.024293
- L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001 https://doi.org/10.1023/A:1010933404324.
- G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
- L. Prokhorenkova, G. Gusev, A. Vorobev, A. Dorogush, and A. Gulin, “CatBoost: Unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems (NeurIPS), 2018.
- T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 2016, pp. 785–794. https://doi.org/10.1145/2939672.29397
- D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, p. 6, 2020
Stroke is one of the biggest challenges facing the world’s public health today and is ranked as the second most
common cause of death and the third most common cause of long term disability globally [1]. Early diagnosis is important
for lowering the risk of death as well as long-term disability from stroke. Traditional methods of diagnosing stroke, such as
CT scans and MRI's, however, can be expensive, time-consuming and require specialists to interpret them. The challenges
associated with these traditional methods of diagnosis can delay the making of decisions regarding treatment, especially in
low-resource settings in which there is a lack of access to advanced imaging technologies or qualified personnel. Due to this,
new approaches are now being studied in an effort to identify stroke rapidly and efficiently. The stroke detection model was
developed based on the machine learning application to structured data from patients. The four supervised learning methods
used were: LightGBM, CatBoost, XGBoost, and Random Forest. These methods were applied using a 70/30 split for the
training and testing set. Accuracy, Precision, Recall, F1-Score, and the Area Under Curve Receiver Operating Characteristic
Curve (AUC-ROC) were all used as measures of model performance. Of the models compared in the study, the Random
Forest method demonstrated superior model performance with an accuracy of 90% and an F1-Score of 0.95. Additionally,
the Random Forest model was able to achieve higher performance than gradient boosting methods for each of the most
important performance metrics. These results indicate that machine learning algorithms particularly those based on
ensemble techniques (such as Random Forest) can potentially be used to enhance current diagnostic pathways for predicting
stroke by providing faster, more scalable and more accessible predictions of stroke risk than are currently available, which
could provide the opportunity for earlier clinical interventions and better patient outcomes, specifically in low resource
health care settings. Machine learning tools should not be used to supplant clinical expertise; however, if integrated into
routine practice, they could mark an important step toward using data more effectively and equitably when caring for
patients with stroke.
Keywords :
Stroke Detection, Machine Learning, Random Forest, Predictive Modeling, Healthcare AI