Personalized dosing of mood stabilizers remains challenging due to substantial inter-individual variability in symptom severity, treatment responsiveness, and vulnerability to adverse effects. Clinical titration is often slow and heuristic, motivating data-driven strategies that can adapt dosage over time while balancing efficacy and safety. In this study, we formulate mood stabilizer dosing as a finite-horizon Markov decision process and develop a reinforcement learning (RL) framework for patient-specific dosage adjustment. We simulate a heterogeneous cohort of 500 patients using clinically inspired latent factors depression severity, anxiety level, treatment responsiveness, and side-effect sensitivity and define mood trajectories through a nonlinear dose response model cantered on an individualized optimal dose. The RL agent selects among discrete dosage changes (−50, −25, 0, 25, 50 mg) across an 8-step titration horizon. A clinically motivated reward function encourages proximity to a target mood (50) and occupancy of a therapeutic range (45 - 55), while penalizing side effects and abrupt dose fluctuations and rewarding sustained stability. A tabular Q-learning agent trained for 1000 episodes achieved stable convergence with consistently positive episode returns (mean reward ≈ 63.6). In population-level evaluation over 200 held-out patients, the learned policy improved final mood by 20.3 points on average, with 68.5% of patients showing positive improvement. However, sustained stabilization remained limited: mean time spent in the therapeutic range was 16.8%, only 12.5% of patients achieved ≥50% occupancy, and 9.5% ended within range. Dose selection accuracy was modest, with 15.0% of final doses within 20% of the patient-specific optimum. Subgroup analysis indicated reduced stabilization performance in high-depression patients. Overall, results demonstrate the feasibility of RL for adaptive psychiatric dosing in a controlled simulation and highlight key state abstraction, horizon length, and function approximation that must be addressed before translational evaluation on real-world clinical data.
Cite this paper
Filippis, R. D. and Foysal, A. A. (2026). Reinforcement Learning-Based Personalized Mood Stabilizer Dosage Optimization. Open Access Library Journal, 13, e14925. doi: http://dx.doi.org/10.4236/oalib.1114925.
Bauer, M.S. and Mitchner, L. (2004) What Is a “Mood Stabilizer”? An Evi-dence-Based Response. American Journal of Psychiatry, 161, 3-18. https://doi.org/10.1176/appi.ajp.161.1.3
Rapoport, S.I., Basselin, M., Kim, H. and Rao, J.S. (2009) Bipolar Disorder and Mechanisms of Action of Mood Stabilizers. Brain Research Reviews, 61, 185-209. https://doi.org/10.1016/j.brainresrev.2009.06.003
Li, X., Frye, M.A. and Shelton, R.C. (2011) Review of Pharmacological Treatment in Mood Disorders and Future Directions for Drug Development. Neuropsychopharmacology, 37, 77-101. https://doi.org/10.1038/npp.2011.198
Maremmani, I., Pacini, M., Lamanna, F., Pani, P.P., Perugi, G., Deltito, J., et al. (2010) Mood Stabilizers in the Treatment of Substance Use Disorders. CNS Spectrums, 15, 95-109. https://doi.org/10.1017/s1092852900027346
Fišar, Z. (2013) Patho-physiology of Mood Disorders and Mechanisms of Action of Antidepressants and Mood Stabilizers. In: Van Bockstaele, E.J., Ed., Endocannabinoid Regulation of Monoamines in Psychiatric and Neurological Disorders, Springer, 103-134. https://doi.org/10.1007/978-1-4614-7940-6_6
López-Muñoz, F., Shen, W.W., D’Ocon, P., Romero, A. and álamo, C. (2018) A History of the Pharmaco-logical Treatment of Bipolar Disorder. International Journal of Molecular Sci-ences, 19, Article No. 2143. https://doi.org/10.3390/ijms19072143
Machado-Vieira, R. and Soares, J.C. (2007) Transtornos de humor refratários a tratamento. Revista Brasileira de Psiquiatria, 29, S48-S54. https://doi.org/10.1590/s1516-44462006005000058
Souery, D., Lipp, O., Massat, I. and Mendlewicz, J. (2001) The Characterization and Definition of Treatment-Resistant Mood Disorders. In: Treatment-Resistant Mood Disorders, Cambridge University Press, 3-29. https://doi.org/10.1017/cbo9780511666421.002
Calabrese, J.R., Fatemi, S.H., Kujawa, M. and Woyshville, M.J. (1996) Predictors of Response to Mood Stabilizers. Journal of Clinical Psychopharmacology, 16, 24S-31S. https://doi.org/10.1097/00004714-199604001-00004
Scott, J., Etain, B., Nierenberg, A. and Bellivier, F. (2020) A Taxonomy of Clinical Response to Mood Stabilizers. Bipolar Disorders, 23, 24-32. https://doi.org/10.1111/bdi.12950
Dell’Osso, B., et al. (2009) Clinical Characteristics and Long-Term Response to Mood Stabilizers in Patients with Bipolar Disorder and Different Age at Onset. Neuropsychiatric Disease and Treatment, 5, 399-404. https://doi.org/10.2147/ndt.s5970
Keck, P.E. and McElroy, S.L. (2002) Clinical Pharmacodynamics and Pharmaco-Kinetics of Antimanic and Mood-Stabilizing Medications. Journal of Clinical Psychiatry, 63, 3-11.
Murru, A., Popovic, D., Pacchiarotti, I., Hidalgo, D., León-Caballero, J. and Vieta, E. (2015) Management of Adverse Effects of Mood Stabilizers. Cur-rent Psychiatry Reports, 17, Article No. 66. https://doi.org/10.1007/s11920-015-0603-z
Yu, C., Liu, J., Nemati, S. and Yin, G. (2021) Reinforcement Learning in Healthcare: A Survey. ACM Computing Surveys, 55, 1-36. https://doi.org/10.1145/3477600
Abdellatif, A.A., Mhaisen, N., Mo-hamed, A., Erbad, A. and Guizani, M. (2023) Reinforcement Learning for Intel-ligent Healthcare Systems: A Review of Challenges, Applications, and Open Re-search Issues. IEEE Internet of Things Journal, 10, 21982-22007. https://doi.org/10.1109/jiot.2023.3288050
Liu, Y., Wang, H., Zhou, H., Li, M., Hou, Y., Zhou, S., et al. (2024) A Review of Reinforcement Learning for Natural Language Processing and Applications in Healthcare. Journal of the American Medical Informatics Association, 31, 2379-2393. https://doi.org/10.1093/jamia/ocae215
Frommeyer, T.C., Gilbert, M.M., Fursmidt, R.M., Park, Y., Khouzam, J.P., Brittain, G.V., et al. (2025) Reinforce-ment Learning and Its Clinical Applications within Healthcare: A Systematic Re-view of Precision Medicine and Dynamic Treatment Regimes. Healthcare, 13, Article No. 1752. https://doi.org/10.3390/healthcare13141752
Gandhi, N. and Mishra, S. (2021) Applications of Reinforcement Learning for Medical Decision Making. Proceedings of the RTA-CSIT 2021, Tirana, 21-22 May 2021, 164-168.
Patel, V.L., Kaufman, D.R. and Arocha, J.F. (2002) Emerging Paradigms of Cognition in Medical Decision-Making. Journal of Biomedical In-formatics, 35, 52-75. https://doi.org/10.1016/s1532-0464(02)00009-6
Stephan, K.E. and Mathys, C. (2014) Computational Approaches to Psychiatry. Current Opinion in Neurobiology, 25, 85-92. https://doi.org/10.1016/j.conb.2013.12.007
Huys, Q.J.M., Maia, T.V. and Paulus, M.P. (2016) Computational Psychiatry: From Mechanistic Insights to the Development of New Treatments. Biological Psychiatry: Cognitive Neuro-science and Neuroimaging, 1, 382-385. https://doi.org/10.1016/j.bpsc.2016.08.001
Alagoz, O., Hsu, H., Schaefer, A.J. and Roberts, M.S. (2009) Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty. Medical Decision Making, 30, 474-483. https://doi.org/10.1177/0272989x09353194
Abu Alsheikh, M., Hoang, D.T., Niyato, D., Tan, H. and Lin, S. (2015) Markov Decision Processes with Ap-plications in Wireless Sensor Networks: A Survey. IEEE Communications Sur-veys & Tutorials, 17, 1239-1267. https://doi.org/10.1109/comst.2015.2420686
Bufano, P., Laurino, M., Said, S., Tognetti, A. and Menicucci, D. (2023) Digital Phenotyping for Monitor-ing Mental Disorders: Systematic Review. Journal of Medical Internet Research, 25, e46778. https://doi.org/10.2196/46778
Potier, R. (2020) The Digi-tal Phenotyping Project: A Psychoanalytical and Network Theory Perspective. Frontiers in Psychology, 11, Article No. 1218. https://doi.org/10.3389/fpsyg.2020.01218
Onnela, J. and Rauch, S.L. (2016) Harnessing Smartphone-Based Digital Phenotyping to Enhance Behav-ioral and Mental Health. Neuropsychopharmacology, 41, 1691-1696. https://doi.org/10.1038/npp.2016.7
O’Connor, G.T. and Weiss, S.T. (1994) Clinical and Symptom Measures. American Journal of Respiratory and Critical Care Medicine, 149, S21-S28. https://doi.org/10.1164/ajrccm/149.2_pt_2.s21
Cleeland, C.S. and Mendoza, T.R. (2010) Symptom Measurement by Patient Report. In: Cleeland, C.S., et al., Eds., Cancer Symptom Science: Measurement, Mechanisms, and Management, Cambridge University Press, 268-284. https://doi.org/10.1017/cbo9780511780868.028
Seppälä, J., De Vita, I., Jämsä, T., Miettunen, J., Isohanni, M., Rubinstein, K., et al. (2019) Mobile Phone and Wearable Sensor-Based Mhealth Approaches for Psychiatric Disorders and Symptoms: Systematic Review. JMIR Mental Health, 6, e9819. https://doi.org/10.2196/mental.9819
Sequeira, L., Battaglia, M., Per-rotta, S., Merikangas, K. and Strauss, J. (2019) Digital Phenotyping with Mobile and Wearable Devices: Advanced Symptom Measurement in Child and Adoles-cent Depression. Journal of the American Academy of Child & Adolescent Psy-chiatry, 58, 841-845. https://doi.org/10.1016/j.jaac.2019.04.011
Mundhenk, M., Goldsmith, J., Lusena, C. and Allender, E. (2000) Complexity of Finite-Horizon Markov De-cision Process Problems. Journal of the ACM, 47, 681-720. https://doi.org/10.1145/347476.347480
Bazrafshan, N. and Lotfi, M.M. (2020) A Finite-Horizon Markov Decision Process Model for Cancer Chemotherapy Treatment Planning: An Ap-plication to Sequential Treatment Decision Making in Clinical Trials. Annals of Operations Research, 295, 483-502. https://doi.org/10.1007/s10479-020-03706-5
Christodoulou, D. and McLeay, S. (2009) Bounded Variation and the Asymmetric Distribution of Scaled Earnings. Accounting and Business Research, 39, 347-372. https://doi.org/10.1080/00014788.2009.9663372
Graham, J.H., Raz, S., Hel-Or, H. and Nevo, E. (2010) Fluctuating Asymmetry: Methods, Theory, and Applications. Symmetry, 2, 466-540. https://doi.org/10.3390/sym2020466
Fielding, C.R. (2010) Planform and Facies Variability in Asymmetric Deltas: Facies Analysis and Depositional Architecture of the Turonian Ferron Sandstone in the Western Henry Moun-tains, South-Central Utah, USA. Journal of Sedimentary Research, 80, 455-479. https://doi.org/10.2110/jsr.2010.047
Bandara, J.S. and Cai, Y. (2014) The Impact of Climate Change on Food Crop Productivity, Food Prices and Food Security in South Asia. Economic Analysis and Policy, 44, 451-465. https://doi.org/10.1016/j.eap.2014.09.005
Westen, D., Novotny, C.M. and Thompson-Brenner, H. (2004) The Empirical Status of Empirically Sup-ported Psychotherapies: Assumptions, Findings, and Reporting in Controlled Clinical Trials. Psychological Bulletin, 130, 631-663. https://doi.org/10.1037/0033-2909.130.4.631
Blatt, S.J. and Zuroff, D.C. (2005) Empirical Evaluation of the Assumptions in Identifying Evidence Based Treatments in Mental Health. Clinical Psychology Review, 25, 459-486. https://doi.org/10.1016/j.cpr.2005.03.001
Rutter, M., Silberg, J., O’Connor, T. and Simonoff, E. (1999) Genetics and Child Psychiatry: II Empiri-cal Research Findings. Journal of Child Psychology and Psychiatry, 40, 19-55. https://doi.org/10.1111/1469-7610.00423
Nisevic, M., Milojevic, D. and Spajic, D. (2025) Synthetic Data in Medicine: Legal and Ethical Considerations for Patient Profiling. Computational and Structural Biotechnology Journal, 28, 190-198. https://doi.org/10.1016/j.csbj.2025.05.026
Rujas, M., Martín Gómez del Moral Herranz, R., Fico, G. and Merino-Barbancho, B. (2025) Syn-thetic Data Generation in Healthcare: A Scoping Review of Reviews on Domains, Motivations, and Future Applications. International Journal of Medical Infor-matics, 195, Article ID: 105763. https://doi.org/10.1016/j.ijmedinf.2024.105763
Pasculli, G., Virgolin, M., Myles, P., Vidovszky, A., Fisher, C., Biasin, E., et al. (2025) Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues. CPT: Pharmacometrics & Systems Pharmacology, 14, 840-852. https://doi.org/10.1002/psp4.70021
Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in Pharmacology, 13, Article ID: 1094281. https://doi.org/10.3389/fphar.2022.1094281
Alasmrai, M., et al. (2025) Personalized Cognitive Behavioral Therapy for Adults Using Machine Learning: A Multi-Factor, Reinforcement-Based Approach. Fusion: Practice & Applications, 20, 53-64.
Banumathi, K., Venkatesan, L., Benjamin, L.S., et al. (2025) Reinforcement Learning in Personalized Medicine: A Comprehensive Review of Treatment Optimization Strategies. Cureus, 17, e82756.
Kok, J.R. and Vlassis, N. (2004) Sparse Tabular Multiagent Q-Learning. In: Annual Machine Learning Conference of Belgium and the Netherlands, Universiteit Twente Press, 65-71.
Ajabshir, V.B., Guzel, M.S. and Bostanci, E. (2022) A Low-Cost Q-Learning-Based Approach to Handle Continuous Space Problems for Decen-tralized Multi-Agent Robot Navigation in Cluttered Environments. IEEE Access, 10, 35287-35301. https://doi.org/10.1109/access.2022.3163393
Jang, B., Kim, M., Harerimana, G. and Kim, J.W. (2019) Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE Access, 7, 133653-133667. https://doi.org/10.1109/access.2019.2941229
Jorgensen, S.C.J., Stew-art, J.J. and Dalton, B.R. (2021) The Case for “Conservative Pharmacotherapy”. Journal of Antimicrobial Chemotherapy, 76, 1658-1660. https://doi.org/10.1093/jac/dkab011
Maron, D.J., Hochman, J.S., Reyn-olds, H.R., et al. (2020) Initial Invasive or Conservative Strategy for Stable Cor-onary Disease. New England Journal of Medicine, 382, 1395-1407.
Nandi, A., Beard, J.R. and Galea, S. (2009) Epidemiologic Heterogeneity of Common Mood and Anxiety Disorders over the Life Course in the General Population: A Systematic Review. BMC Psychiatry, 9, Article No. 31. https://doi.org/10.1186/1471-244x-9-31
Csajka, C. and Verotta, D. (2006) Pharmacokinetic-Pharmacodynamic Modelling: History and Perspec-tives. Journal of Pharmacokinetics and Pharmacodynamics, 33, 227-279. https://doi.org/10.1007/s10928-005-9002-0
Gruwez, B., Poirier, M., Dauphin, A., Olié, J. and Tod, M. (2007) A Kinetic-Pharmacodynamic Model for Clinical Trial Simulation of Antidepressant Action: Application to Clomipra-mine-Lithium Interaction. Contemporary Clinical Trials, 28, 276-287. https://doi.org/10.1016/j.cct.2006.09.001
Sheiner, L.B. and Steimer, J. (2000) Pharmacokinetic/Pharmacodynamic Modeling in Drug Development. Annual Review of Pharmacology and Toxicology, 40, 67-95. https://doi.org/10.1146/annurev.pharmtox.40.1.67