A Risk-Aware Evaluation Framework for Reinforcement Learning-Based Adaptive Cancer Therapy


Authors : Mohammed Umar Alhaji; Usman Ahmad Ahmad; Nasir Muazu Abba

Volume/Issue : Volume 11 - 2026, Issue 1 - January


Google Scholar : https://tinyurl.com/yntn9dsp

Scribd : https://tinyurl.com/rnw6nmh9

DOI : https://doi.org/10.38124/ijisrt/26jan1026

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Reinforcement learning has emerged as a promising approach for adaptive cancer therapy due to its ability to optimize sequential treatment decisions under uncertainty. While studies have demonstrated the potential of reinforcement learning to improve simulated treatment outcomes, most evaluations rely primarily on average performance metrics obtained through direct simulation rollouts. Such evaluation practices provide limited insight into uncertainty, robustness, and worst-case behavior, which are critical considerations in safety-sensitive clinical domains. This study proposes a standardized, risk-aware, and uncertainty-sensitive evaluation framework for reinforcement learning based adaptive cancer therapy using simulated tumor environments. A Deep Q Network policy is evaluated against clinically interpretable baselines using multiple performance perspectives, including mean outcomes, worst-case metrics, and tail risk measures based on Conditional Value at Risk. Robustness is further assessed under parameter perturbations and distribution shifts representing aggressive tumor dynamics. Experimental results demonstrate that adaptive reinforcement learning policies achieve tumor control comparable to maximum dose therapy while maintaining controlled risk exposure and stable performance under uncertainty. The findings emphasize that rigorous, risk-sensitive evaluation is essential for drawing reliable conclusions about reinforcement learning based treatment strategies before any real-world deployment.

Keywords : Reinforcement Learning; Adaptive Cancer Therapy; Risk Aware Evaluation; Conditional Value at Risk; Simulation- Based Evaluation; Deep Q Network.

References :

  1. A. Eastman, J. S. Brown, and J. J. Cunningham, “Reinforcement learning-derived chemotherapeutic schedules for robust patient-specific therapy,” Nature Machine Intelligence, vol. 3, no. 12, pp. 1091-1101, Dec. 2021. https://doi.org/10.1038/s42256-021-00424-3.
  2. X. Fu, Y. Luo, and B. Schölkopf, “Off-policy evaluation in reinforcement learning: A survey,” ACM Computing Surveys, vol. 56, no. 1, Art. no. 9, pp. 1-39, 2024. https://doi.org/10.1145/3594560.
  3. O. Gottesman et al., “Guidelines for reinforcement learning in healthcare,” Nature Medicine, vol. 25, no. 1, pp. 16-18, Jan. 2019. https://doi.org/10.1038/s41591-018-0310-5.
  4. N. Jiang and L. Li, “Doubly robust off-policy value evaluation for reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning, PMLR, 2016, pp. 652-661.
  5. M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, “The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,” Nature Medicine, vol. 24, no. 11, pp. 1716-1720, Nov. 2018. https://doi.org/10.1038/s41591-018-0213-5.
  6. M. Labrie, M. Tannenbaum, and R. A. Gatenby, “Adaptive therapy in oncology: Principles and perspectives,” Cancer Research, vol. 82, no. 15, pp. 2761-2769, Aug. 2022. https://doi.org/10.1158/0008-5472.CAN-21-3849.
  7. S. A. Murphy, “Optimal dynamic treatment regimes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 65, no. 2, pp. 331-355, 2003. https://doi.org/10.1111/1467-9868.00389.
  8. S. Nemati, M. M. Ghassemi, and G. D. Clifford, “Optimal medical therapy dosing from suboptimal clinical examples: A deep reinforcement learning approach,” in Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2016, pp. 2978-2981. https://doi.org/10.1109/EMBC.2016.7591355.
  9. S. Padmanabhan, A. Mesbah, and Y. Shen, “Reinforcement learning in healthcare: A systematic review,” Artificial Intelligence in Medicine, vol. 145, Art. no. 102657, 2024. https://doi.org/10.1016/j.artmed.2023.102657.
  10. D. Precup, R. S. Sutton, and S. Singh, “Eligibility traces for off-policy policy evaluation,” in Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, 2000, pp. 759-766.
  11. Y. Shen, Y. Wu, and Z. Wang, “Deep reinforcement learning for chemotherapy scheduling,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 6, pp. 1387-1398, Jun. 2017. https://doi.org/10.1109/TBME.2016.2608790.
  12. A. Singh, R. Kumar, and D. Gupta, “Sequential decision-making in oncology using reinforcement learning,” Frontiers in Oncology, vol. 14, Art. no. 1278456, 2024. https://doi.org/10.3389/fonc.2024.1278456.
  13. X. Sun, Y. Zhang, and J. Li, “Adaptive cancer therapy via deep reinforcement learning,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 8, pp. 4021-4032, Aug. 2023. https://doi.org/10.1109/JBHI.2023.3279410.
  14. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
  15. P. S. Thomas and E. Brunskill, “Data-efficient off-policy policy evaluation for reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning, PMLR, 2016, pp. 2139-2148.
  16. L. Xu, S. Zhu, and N. Wen, “Deep reinforcement learning and its applications in medical imaging and radiation therapy: A survey,” Physics in Medicine and Biology, vol. 67, no. 22, Art. no. 22TR02, 2022. https://doi.org/10.1088/1361-6560/ac9cb3.
  17. C. Yu, J. Liu, S. Nemati, and F. Doshi-Velez, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys, vol. 55, no. 1, Art. no. 5, pp. 1-36, 2019. https://doi.org/10.1145/3312042.

Reinforcement learning has emerged as a promising approach for adaptive cancer therapy due to its ability to optimize sequential treatment decisions under uncertainty. While studies have demonstrated the potential of reinforcement learning to improve simulated treatment outcomes, most evaluations rely primarily on average performance metrics obtained through direct simulation rollouts. Such evaluation practices provide limited insight into uncertainty, robustness, and worst-case behavior, which are critical considerations in safety-sensitive clinical domains. This study proposes a standardized, risk-aware, and uncertainty-sensitive evaluation framework for reinforcement learning based adaptive cancer therapy using simulated tumor environments. A Deep Q Network policy is evaluated against clinically interpretable baselines using multiple performance perspectives, including mean outcomes, worst-case metrics, and tail risk measures based on Conditional Value at Risk. Robustness is further assessed under parameter perturbations and distribution shifts representing aggressive tumor dynamics. Experimental results demonstrate that adaptive reinforcement learning policies achieve tumor control comparable to maximum dose therapy while maintaining controlled risk exposure and stable performance under uncertainty. The findings emphasize that rigorous, risk-sensitive evaluation is essential for drawing reliable conclusions about reinforcement learning based treatment strategies before any real-world deployment.

Keywords : Reinforcement Learning; Adaptive Cancer Therapy; Risk Aware Evaluation; Conditional Value at Risk; Simulation- Based Evaluation; Deep Q Network.

Paper Submission Last Date
28 - February - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe