Treatment selection for bipolar depression remains largely trial-and-error, with substantial non-response to first-line strategies and clinically meaningful risk of mood destabilization. We developed a deep reinforcement learning (RL) framework to optimize treatment selection while explicitly penalizing destabilization events. We implemented RL-CADENCE, a simulated multi-centre experimental framework designed to emulate a parallel-group randomized trial across 12 virtual psychiatric centres. Using a combination of publicly available online data sources and clinically informed synthetic generation, we constructed a cohort of 2500 virtual participants representing bipolar spectrum disorders. Virtual participants were algorithmically allocated (3:3:3:1) to four treatment strategies: 1) lithium + SSRI, 2) quetiapine + lamotrigine, 3) lurasidone + mood stabilizer (lithium or valproate), or 4) RL-personalized treatment selection. The primary endpoint was the simulated change in Montgomery ?sberg Depression Rating Scale (MADRS) score over 12 months. Secondary outcomes included response, mood destabilization events, and quality-adjusted life years (QALYs). A causal machine learning pipeline estimated conditional average treatment effects (CATE) to characterize heterogeneity across subgroups within the synthetic cohort. In simulation, the RL-personalized strategy achieved greater MADRS improvement than pooled standard protocols (mean difference: ?5.6 points; 95% CI: ?7.6 to ?3.6; Cohen’s d = 0.78). Simulated response rates (≥50% MADRS reduction) were 95.7% versus 58.9%, and mood destabilization occurred in 4.8% versus 10.8% of synthetic patient-months. The RL policy network achieved an AUC-ROC of 0.89 for predicting the optimal treatment strategy under the simulated counterfactual evaluation. Heterogeneous effects were largest in mixed features (CATE = 15.2; 95% CI: 8.9 - 21.5) and bipolar I subtype (CATE = 12.3; 95% CI: 7.1 - 17.5). Within a simulated, synthetic-data evaluation, deep RL showed strong potential to personalize antidepressant-related treatment selection in bipolar spectrum disorders, improving depressive symptom outcomes while reducing destabilization risk. These findings provide proof-of-concept for RL-based precision psychiatry and motivate prospective validation in real-world clinical cohorts.Subject AreasPsychiatry & Psychology
Cite this paper
Filippis, R. D. and Foysal, A. A. (2026). Deep Reinforcement Learning for Personalized Antidepressant Decision Support in Bipolar Spectrum Disorders: Simulated Randomized Trial Framework. Open Access Library Journal, 13, e15137. doi: http://dx.doi.org/10.4236/oalib.1115137.
Vos, T., Lim, S.S., Abbafati, C., Abbas, K.M., Abbasi, M., Abbasifard, M., <i>et al</i>. (2020) Global Burden of 369 Diseases and Injuries in 204 Countries and Territories, 1990-2019: A Systematic Analysis for the Global Burden of Disease Study 2019. <i>The</i> <i>Lancet</i>, 396, 1204-1222. <br>https://doi.org/10.1016/s0140-6736(20)30925-9
Sidor, M.M. and MacQueen, G.M. (2011) Antidepressants for the Acute Treatment of Bipolar Depression: A Systematic Review and Meta-Analysis. <i>The</i> <i>Journal</i> <i>of</i> <i>Clin</i><i>ical</i> <i>Psychiatry</i>, 72, 156-167. <br>https://doi.org/10.4088/jcp.09r05385gre
Pacchiarotti, I., Bond, D.J., Baldessarini, R.J., Nolen, W.A., Grunze, H., Licht, R.W., <i>et al</i>. (2013) The International Society for Bipolar Disorders (ISBD) Task Force Report on Antidepressant Use in Bipolar Disorders. <i>American</i> <i>Journal</i> <i>of</i> <i>Psychiatry</i>, 170, 1249-1262. <br>https://doi.org/10.1176/appi.ajp.2013.13020185
Goodwin, G., Haddad, P., Ferrier, I., Aronson, J., Barnes, T., Cipriani, A., <i>et al</i>. (2016) Evidence-Based Guidelines for Treating Bipolar Disorder: Revised Third Edition Recommendations from the British Association for Psychopharmacology. <i>Journal</i> <i>of</i> <i>Psychopharmacology</i>, 30, 495-553. <br>https://doi.org/10.1177/0269881116636545
Topol, E.J. (2019) High-Performance Medicine: The Convergence of Human and Artificial Intelligence. <i>Nature</i> <i>Medicine</i>, 25, 44-56. <br>https://doi.org/10.1038/s41591-018-0300-7
Rajpurkar, P., Chen, E., Banerjee, O. and Topol, E.J. (2022) AI in Health and Medicine. <i>Nature</i> <i>Medicine</i>, 28, 31-38. <br>https://doi.org/10.1038/s41591-021-01614-0
Kessler, R.C., Warner, C.H., Ivany, C., Petukhova, M.V., Rose, S., Bromet, E.J., <i>et al</i>. (2015) Predicting Suicides after Psychiatric Hospitalization in US Army Soldiers: The Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). <i>JAMA</i> <i>Psychiatry</i>, 72, 49-57. <br>https://doi.org/10.1001/jamapsychiatry.2014.1754
Chen, J.H. and Asch, S.M. (2017) Machine Learning and Prediction in Medicine—Beyond the Peak of Inflated Expectations. <i>New</i> <i>England</i> <i>Journal</i> <i>of</i> <i>Medicine</i>, 376, 2507-2509. <br>https://doi.org/10.1056/nejmp1702071
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., <i>et al</i>. (2016) Mastering the Game of Go with Deep Neural Networks and Tree Search. <i>Nature</i>, 529, 484-489. <br>https://doi.org/10.1038/nature16961
Vinyals, O., Babuschkin, I., Czarnecki, W.M., Mathieu, M., Dudzik, A., Chung, J., <i>et al</i>. (2019) Grandmaster Level in StarCraft II Using Multi-Agent Reinforcement Learning. <i>Nature</i>, 575, 350-354. <br>https://doi.org/10.1038/s41586-019-1724-z
Komorowski, M., Celi, L.A., Badawi, O., Gordon, A.C. and Faisal, A.A. (2018) The Artificial Intelligence Clinician Learns Optimal Treatment Strategies for Sepsis in Intensive Care. <i>Nature</i> <i>Medicine</i>, 24, 1716-1720. <br>https://doi.org/10.1038/s41591-018-0213-5
Peng, X., Ding, Y., Wirsching, W., <i>et al</i>. (2018) Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning. <i>AMIA Annual Symposium Proceedings</i>, San Francisco, 3-7 November 2018, 887-896.
Zhao, R., Pacella, M., Sanmugarajah, J., <i>et al</i>. (2022) Deep Reinforcement Learning for Treatment Duration Decision Making in Acute Lymphoblastic Leukemia. <i>IEEE Journal of Biomedical and Health Informatics</i>, 26, 4623-4634.
He, M., Bakker, E.M. and Lew, M.S. (2024) DPD (Depression Detection) Net: A Deep Neural Network for Multimodal Depression Detection. <i>Health Information Science and Systems</i>, 12, Article No. 53. <br>https://doi.org/10.1007/s13755-024-00311-9
First, M.B., Williams, J.B.W., Karg, R.S. and Spitzer, R.L. (2015) Structured Clinical Interview for DSM-5 Research Version (SCID-5 for DSM-5, Research Version; SCID-5-RV). American Psychiatric Association.
Montgomery, S.A. and Åsberg, M. (1979) A New Depression Scale Designed to Be Sensitive to Change. <i>British Journal of Psychiatry</i>, 134, 382-389. <br>https://doi.org/10.1192/bjp.134.4.382
Thomas, P., Theocharous, G. and Ghavamzadeh, M. (2015) High-Confidence Off-Policy Evaluation. <i>Proceedings of the AAAI Conference on Artificial Intelligence</i>, 29, 3000-3006. <br>https://doi.org/10.1609/aaai.v29i1.9541
Schulman, J., Levine, S., Abbeel, P., Jordan, M. and Moritz, P. (2015) Trust Region Policy Optimization. <i>International Conference on Machine Learning</i>, Lille, 7-9 July 2015, 1889-1897.
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., <i>et al</i>. (2018) Double/Debiased Machine Learning for Treatment and Structural Parameters. <i>The Econometrics Journal</i>, 21, C1-C68. <br>https://doi.org/10.1111/ectj.12097
Lundberg, S.M. and Lee, S.I. (2017) A Unified Approach to Interpreting Model Predictions. <i>Advances in Neural Information Processing Systems </i>30:<i> Annual Conference on Neural Information Processing Systems</i> 2017, Long Beach, 4-9 December 2017, 4765-4774.
Geddes, J.R., Gardiner, A., Rendell, J., Voysey, M., Tunbridge, E., Hinds, A., <i>et al</i>. (2022) Comparative Evaluation of Quetiapine plus Lamotrigine Combination versus Quetiapine Monotherapy in Bipolar Depression: A Randomized, Double-Blind, Placebo-Controlled Trial. <i>The </i><i>Lancet Psychiatry</i>, 9, 883-894.
Wray, N.R., Ripke, S., Mattheisen, M., Trzaskowski, M., Byrne, E.M., Abdellaoui, A., <i>et al</i>. (2018) Genome-Wide Association Analyses Identify 44 Risk Variants and Refine the Genetic Architecture of Major Depression. <i>Nature Genetics</i>, 50, 668-681. <br>https://doi.org/10.1038/s41588-018-0090-3
Kanchapogu, N.R. and Mohanty, S.N. (2025) Deep Learning with Ensemble-Based Hybrid AI Model for Bipolar and Unipolar Depression Detection Using Demographic and Behavioral Based on Time-Series Data. <i>Dialogues</i> <i>in</i> <i>Clinical</i> <i>Neuroscience</i>, 27, 16-35. <br>https://doi.org/10.1080/19585969.2025.2524337
Kessler, R.C., Bossarte, R.M., Luedtke, A., <i>et al</i>. (2023) Evaluation of a Machine Learning-Based Prediction Model for Benefit and Harm from Antidepressant Treatment in the EM-BARC Randomized Clinical Trial. <i>JAMA Network Open</i>, 6, e2327755.
Iniesta, R., Hodgson, K., Stahl, D., Malki, K., Maier, W., Rietschel, M., <i>et al</i>. (2018) Antidepressant Drug-Specific Prediction of Depression Treatment Outcomes from Genetic and Clinical Variables. <i>Scientific</i> <i>Reports</i>, 8, Article No. 5380. <br>https://doi.org/10.1038/s41598-018-23584-z
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mané, D. (2016) Concrete Problems in AI Safety. <br>https://doi.org/10.48550/arXiv.1606.06565
Knevel, R. and Liao, K.P. (2023) From Real-World Electronic Health Record Data to Real-World Results Using Artificial Intelligence. <i>Annals of the Rheumatic Diseases</i>, 82, 306-311. <br>https://doi.org/10.1136/ard-2022-222626
Kosorok, M.R. and Laber, E.B. (2019) Precision Medicine. <i>Annual Review of Statistics and Its Application</i>, 6, 263-286. <br>https://doi.org/10.1146/annurev-statistics-030718-105251
Ghassemi, M., Oakden-Rayner, L. and Beam, A.L. (2021) The False Hope of Current Approaches to Explainable Artificial Intelligence in Health Care. <i>The</i> <i>Lancet</i> <i>Digital</i> <i>Health</i>, 3, e745-e750. <br>https://doi.org/10.1016/s2589-7500(21)00208-9