Reinforcement Learning-Based Personalized Mood Stabilizer Dosage Optimization

doi:10.4236/oalib.1114925

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 13 2026

查看所有领域

Reinforcement Learning-Based Personalized Mood Stabilizer Dosage Optimization

DOI: 10.4236/oalib.1114925, PP. 1-21

Rocco de Filippis,Abdullah Al Foysal

Subject Areas: Artificial Intelligence, Psychiatry & Psychology

Keywords: Reinforcement Learning, Translational Evaluation, Clinical Data

Full-Text Cite this paper Add to My Lib

Abstract

Personalized dosing of mood stabilizers remains challenging due to substantial inter-individual variability in symptom severity, treatment responsiveness, and vulnerability to adverse effects. Clinical titration is often slow and heuristic, motivating data-driven strategies that can adapt dosage over time while balancing efficacy and safety. In this study, we formulate mood stabilizer dosing as a finite-horizon Markov decision process and develop a reinforcement learning (RL) framework for patient-specific dosage adjustment. We simulate a heterogeneous cohort of 500 patients using clinically inspired latent factors depression severity, anxiety level, treatment responsiveness, and side-effect sensitivity and define mood trajectories through a nonlinear dose response model cantered on an individualized optimal dose. The RL agent selects among discrete dosage changes (−50, −25, 0, 25, 50 mg) across an 8-step titration horizon. A clinically motivated reward function encourages proximity to a target mood (50) and occupancy of a therapeutic range (45 - 55), while penalizing side effects and abrupt dose fluctuations and rewarding sustained stability. A tabular Q-learning agent trained for 1000 episodes achieved stable convergence with consistently positive episode returns (mean reward ≈ 63.6). In population-level evaluation over 200 held-out patients, the learned policy improved final mood by 20.3 points on average, with 68.5% of patients showing positive improvement. However, sustained stabilization remained limited: mean time spent in the therapeutic range was 16.8%, only 12.5% of patients achieved ≥50% occupancy, and 9.5% ended within range. Dose selection accuracy was modest, with 15.0% of final doses within 20% of the patient-specific optimum. Subgroup analysis indicated reduced stabilization performance in high-depression patients. Overall, results demonstrate the feasibility of RL for adaptive psychiatric dosing in a controlled simulation and highlight key state abstraction, horizon length, and function approximation that must be addressed before translational evaluation on real-world clinical data.

Cite this paper

Filippis, R. D. and Foysal, A. A. (2026). Reinforcement Learning-Based Personalized Mood Stabilizer Dosage Optimization. Open Access Library Journal, 13, e14925. doi: http://dx.doi.org/10.4236/oalib.1114925.

References

[1]	Bauer, M.S. and Mitchner, L. (2004) What Is a “Mood Stabilizer”? An Evi-dence-Based Response. American Journal of Psychiatry, 161, 3-18. https://doi.org/10.1176/appi.ajp.161.1.3
[2]	Rapoport, S.I., Basselin, M., Kim, H. and Rao, J.S. (2009) Bipolar Disorder and Mechanisms of Action of Mood Stabilizers. Brain Research Reviews, 61, 185-209. https://doi.org/10.1016/j.brainresrev.2009.06.003
[3]	Li, X., Frye, M.A. and Shelton, R.C. (2011) Review of Pharmacological Treatment in Mood Disorders and Future Directions for Drug Development. Neuropsychopharmacology, 37, 77-101. https://doi.org/10.1038/npp.2011.198
[4]	Maremmani, I., Pacini, M., Lamanna, F., Pani, P.P., Perugi, G., Deltito, J., et al. (2010) Mood Stabilizers in the Treatment of Substance Use Disorders. CNS Spectrums, 15, 95-109. https://doi.org/10.1017/s1092852900027346
[5]	Fišar, Z. (2013) Patho-physiology of Mood Disorders and Mechanisms of Action of Antidepressants and Mood Stabilizers. In: Van Bockstaele, E.J., Ed., Endocannabinoid Regulation of Monoamines in Psychiatric and Neurological Disorders, Springer, 103-134. https://doi.org/10.1007/978-1-4614-7940-6_6
[6]	Baldessarini, R.J., Tondo, L. and Vázquez, G.H. (2018) Pharmacological Treatment of Adult Bipolar Dis-order. Molecular Psychiatry, 24, 198-217. https://doi.org/10.1038/s41380-018-0044-2
[7]	López-Muñoz, F., Shen, W.W., D’Ocon, P., Romero, A. and álamo, C. (2018) A History of the Pharmaco-logical Treatment of Bipolar Disorder. International Journal of Molecular Sci-ences, 19, Article No. 2143. https://doi.org/10.3390/ijms19072143
[8]	Søndergård, L., Lopez, A.G., An-dersen, P.K. and Kessing, L.V. (2008) Mood-Stabilizing Pharmacological Treat-ment in Bipolar Disorders and Risk of Suicide. Bipolar Disorders, 10, 87-94. https://doi.org/10.1111/j.1399-5618.2008.00464.x
[9]	Machado-Vieira, R. and Soares, J.C. (2007) Transtornos de humor refratários a tratamento. Revista Brasileira de Psiquiatria, 29, S48-S54. https://doi.org/10.1590/s1516-44462006005000058
[10]	Carvalho, A. and McIntyre, R.S. (2015) Treatment-Resistant Mood Disorders. Oxford University Press.
[11]	Souery, D., Lipp, O., Massat, I. and Mendlewicz, J. (2001) The Characterization and Definition of Treatment-Resistant Mood Disorders. In: Treatment-Resistant Mood Disorders, Cambridge University Press, 3-29. https://doi.org/10.1017/cbo9780511666421.002
[12]	Calabrese, J.R., Fatemi, S.H., Kujawa, M. and Woyshville, M.J. (1996) Predictors of Response to Mood Stabilizers. Journal of Clinical Psychopharmacology, 16, 24S-31S. https://doi.org/10.1097/00004714-199604001-00004
[13]	Scott, J., Etain, B., Nierenberg, A. and Bellivier, F. (2020) A Taxonomy of Clinical Response to Mood Stabilizers. Bipolar Disorders, 23, 24-32. https://doi.org/10.1111/bdi.12950
[14]	Dell’Osso, B., et al. (2009) Clinical Characteristics and Long-Term Response to Mood Stabilizers in Patients with Bipolar Disorder and Different Age at Onset. Neuropsychiatric Disease and Treatment, 5, 399-404. https://doi.org/10.2147/ndt.s5970
[15]	Keck, P.E. and McElroy, S.L. (2002) Clinical Pharmacodynamics and Pharmaco-Kinetics of Antimanic and Mood-Stabilizing Medications. Journal of Clinical Psychiatry, 63, 3-11.
[16]	Murru, A., Popovic, D., Pacchiarotti, I., Hidalgo, D., León-Caballero, J. and Vieta, E. (2015) Management of Adverse Effects of Mood Stabilizers. Cur-rent Psychiatry Reports, 17, Article No. 66. https://doi.org/10.1007/s11920-015-0603-z
[17]	Yu, C., Liu, J., Nemati, S. and Yin, G. (2021) Reinforcement Learning in Healthcare: A Survey. ACM Computing Surveys, 55, 1-36. https://doi.org/10.1145/3477600
[18]	Abdellatif, A.A., Mhaisen, N., Mo-hamed, A., Erbad, A. and Guizani, M. (2023) Reinforcement Learning for Intel-ligent Healthcare Systems: A Review of Challenges, Applications, and Open Re-search Issues. IEEE Internet of Things Journal, 10, 21982-22007. https://doi.org/10.1109/jiot.2023.3288050
[19]	Liu, Y., Wang, H., Zhou, H., Li, M., Hou, Y., Zhou, S., et al. (2024) A Review of Reinforcement Learning for Natural Language Processing and Applications in Healthcare. Journal of the American Medical Informatics Association, 31, 2379-2393. https://doi.org/10.1093/jamia/ocae215
[20]	Frommeyer, T.C., Gilbert, M.M., Fursmidt, R.M., Park, Y., Khouzam, J.P., Brittain, G.V., et al. (2025) Reinforce-ment Learning and Its Clinical Applications within Healthcare: A Systematic Re-view of Precision Medicine and Dynamic Treatment Regimes. Healthcare, 13, Article No. 1752. https://doi.org/10.3390/healthcare13141752
[21]	Gandhi, N. and Mishra, S. (2021) Applications of Reinforcement Learning for Medical Decision Making. Proceedings of the RTA-CSIT 2021, Tirana, 21-22 May 2021, 164-168.
[22]	Patel, V.L., Kaufman, D.R. and Arocha, J.F. (2002) Emerging Paradigms of Cognition in Medical Decision-Making. Journal of Biomedical In-formatics, 35, 52-75. https://doi.org/10.1016/s1532-0464(02)00009-6
[23]	Gholap, A.D., Khuspe, P.R., Bharati, D.K., et al. (2025) Computational Neuropharmacology in Psychiatry. In: Prajapati, B., et al., Eds., Computational Neuropharmacology: Fundamentals and Clinical Aspects, Wiley, 207-244.
[24]	Stephan, K.E. and Mathys, C. (2014) Computational Approaches to Psychiatry. Current Opinion in Neurobiology, 25, 85-92. https://doi.org/10.1016/j.conb.2013.12.007
[25]	Robbins, T.W. and Cardi-nal, R.N. (2019) Computational Psychopharmacology: A Translational and Pragmatic Approach. Psychopharmacology, 236, 2295-2305. https://doi.org/10.1007/s00213-019-05302-3
[26]	Huys, Q.J.M., Maia, T.V. and Paulus, M.P. (2016) Computational Psychiatry: From Mechanistic Insights to the Development of New Treatments. Biological Psychiatry: Cognitive Neuro-science and Neuroimaging, 1, 382-385. https://doi.org/10.1016/j.bpsc.2016.08.001
[27]	Chang, H.S., Fard, P.J., Marcus, S.I. and Shayman, M. (2003) Multi-Time Scale Markov Decision Pro-cesses. IEEE Transactions on Automatic Control, 48, 976-987. https://doi.org/10.1109/tac.2003.812782
[28]	Alagoz, O., Hsu, H., Schaefer, A.J. and Roberts, M.S. (2009) Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty. Medical Decision Making, 30, 474-483. https://doi.org/10.1177/0272989x09353194
[29]	Abu Alsheikh, M., Hoang, D.T., Niyato, D., Tan, H. and Lin, S. (2015) Markov Decision Processes with Ap-plications in Wireless Sensor Networks: A Survey. IEEE Communications Sur-veys & Tutorials, 17, 1239-1267. https://doi.org/10.1109/comst.2015.2420686
[30]	Bufano, P., Laurino, M., Said, S., Tognetti, A. and Menicucci, D. (2023) Digital Phenotyping for Monitor-ing Mental Disorders: Systematic Review. Journal of Medical Internet Research, 25, e46778. https://doi.org/10.2196/46778
[31]	Potier, R. (2020) The Digi-tal Phenotyping Project: A Psychoanalytical and Network Theory Perspective. Frontiers in Psychology, 11, Article No. 1218. https://doi.org/10.3389/fpsyg.2020.01218
[32]	Onnela, J. and Rauch, S.L. (2016) Harnessing Smartphone-Based Digital Phenotyping to Enhance Behav-ioral and Mental Health. Neuropsychopharmacology, 41, 1691-1696. https://doi.org/10.1038/npp.2016.7
[33]	Friston, K. (2022) Computational Psychiatry: From Synapses to Sentience. Molecular Psychiatry, 28, 256-268. https://doi.org/10.1038/s41380-022-01743-z
[34]	O’Connor, G.T. and Weiss, S.T. (1994) Clinical and Symptom Measures. American Journal of Respiratory and Critical Care Medicine, 149, S21-S28. https://doi.org/10.1164/ajrccm/149.2_pt_2.s21
[35]	Cleeland, C.S. and Mendoza, T.R. (2010) Symptom Measurement by Patient Report. In: Cleeland, C.S., et al., Eds., Cancer Symptom Science: Measurement, Mechanisms, and Management, Cambridge University Press, 268-284. https://doi.org/10.1017/cbo9780511780868.028
[36]	Seppälä, J., De Vita, I., Jämsä, T., Miettunen, J., Isohanni, M., Rubinstein, K., et al. (2019) Mobile Phone and Wearable Sensor-Based Mhealth Approaches for Psychiatric Disorders and Symptoms: Systematic Review. JMIR Mental Health, 6, e9819. https://doi.org/10.2196/mental.9819
[37]	Sequeira, L., Battaglia, M., Per-rotta, S., Merikangas, K. and Strauss, J. (2019) Digital Phenotyping with Mobile and Wearable Devices: Advanced Symptom Measurement in Child and Adoles-cent Depression. Journal of the American Academy of Child & Adolescent Psy-chiatry, 58, 841-845. https://doi.org/10.1016/j.jaac.2019.04.011
[38]	Mundhenk, M., Goldsmith, J., Lusena, C. and Allender, E. (2000) Complexity of Finite-Horizon Markov De-cision Process Problems. Journal of the ACM, 47, 681-720. https://doi.org/10.1145/347476.347480
[39]	Chamie, M. and Acikmese, B. (2015) Finite-Horizon Markov Decision Processes with State Con-straints.
[40]	Bazrafshan, N. and Lotfi, M.M. (2020) A Finite-Horizon Markov Decision Process Model for Cancer Chemotherapy Treatment Planning: An Ap-plication to Sequential Treatment Decision Making in Clinical Trials. Annals of Operations Research, 295, 483-502. https://doi.org/10.1007/s10479-020-03706-5
[41]	Kallenberg, L. (2011) Markov Decision Processes. Lecture Notes. University of Leiden, 428.
[42]	Christodoulou, D. and McLeay, S. (2009) Bounded Variation and the Asymmetric Distribution of Scaled Earnings. Accounting and Business Research, 39, 347-372. https://doi.org/10.1080/00014788.2009.9663372
[43]	Graham, J.H., Raz, S., Hel-Or, H. and Nevo, E. (2010) Fluctuating Asymmetry: Methods, Theory, and Applications. Symmetry, 2, 466-540. https://doi.org/10.3390/sym2020466
[44]	Fielding, C.R. (2010) Planform and Facies Variability in Asymmetric Deltas: Facies Analysis and Depositional Architecture of the Turonian Ferron Sandstone in the Western Henry Moun-tains, South-Central Utah, USA. Journal of Sedimentary Research, 80, 455-479. https://doi.org/10.2110/jsr.2010.047
[45]	Bandara, J.S. and Cai, Y. (2014) The Impact of Climate Change on Food Crop Productivity, Food Prices and Food Security in South Asia. Economic Analysis and Policy, 44, 451-465. https://doi.org/10.1016/j.eap.2014.09.005
[46]	Westen, D., Novotny, C.M. and Thompson-Brenner, H. (2004) The Empirical Status of Empirically Sup-ported Psychotherapies: Assumptions, Findings, and Reporting in Controlled Clinical Trials. Psychological Bulletin, 130, 631-663. https://doi.org/10.1037/0033-2909.130.4.631
[47]	Blatt, S.J. and Zuroff, D.C. (2005) Empirical Evaluation of the Assumptions in Identifying Evidence Based Treatments in Mental Health. Clinical Psychology Review, 25, 459-486. https://doi.org/10.1016/j.cpr.2005.03.001
[48]	Rutter, M., Silberg, J., O’Connor, T. and Simonoff, E. (1999) Genetics and Child Psychiatry: II Empiri-cal Research Findings. Journal of Child Psychology and Psychiatry, 40, 19-55. https://doi.org/10.1111/1469-7610.00423
[49]	Nisevic, M., Milojevic, D. and Spajic, D. (2025) Synthetic Data in Medicine: Legal and Ethical Considerations for Patient Profiling. Computational and Structural Biotechnology Journal, 28, 190-198. https://doi.org/10.1016/j.csbj.2025.05.026
[50]	Rujas, M., Martín Gómez del Moral Herranz, R., Fico, G. and Merino-Barbancho, B. (2025) Syn-thetic Data Generation in Healthcare: A Scoping Review of Reviews on Domains, Motivations, and Future Applications. International Journal of Medical Infor-matics, 195, Article ID: 105763. https://doi.org/10.1016/j.ijmedinf.2024.105763
[51]	Hao, S., Han, W.F., Jiang, T., et al. (2024) Synthetic Data in AI: Challenges, Applications, and Ethi-cal Implications.
[52]	Pasculli, G., Virgolin, M., Myles, P., Vidovszky, A., Fisher, C., Biasin, E., et al. (2025) Synthetic Data in Healthcare and Drug Development: Definitions, Regulatory Frameworks, Issues. CPT: Pharmacometrics & Systems Pharmacology, 14, 840-852. https://doi.org/10.1002/psp4.70021
[53]	Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in Pharmacology, 13, Article ID: 1094281. https://doi.org/10.3389/fphar.2022.1094281
[54]	Alasmrai, M., et al. (2025) Personalized Cognitive Behavioral Therapy for Adults Using Machine Learning: A Multi-Factor, Reinforcement-Based Approach. Fusion: Practice & Applications, 20, 53-64.
[55]	Banumathi, K., Venkatesan, L., Benjamin, L.S., et al. (2025) Reinforcement Learning in Personalized Medicine: A Comprehensive Review of Treatment Optimization Strategies. Cureus, 17, e82756.
[56]	Tóth, S.H., Bárdos, á. and Viharos, Z.J. (2023) Tabular Q-Learning Based Reinforce-ment Learning Agent for Autonomous Vehicle Drift Initiation and Stabilization. IFAC-PapersOnLine, 56, 4896-4903. https://doi.org/10.1016/j.ifacol.2023.10.1261
[57]	Kok, J.R. and Vlassis, N. (2004) Sparse Tabular Multiagent Q-Learning. In: Annual Machine Learning Conference of Belgium and the Netherlands, Universiteit Twente Press, 65-71.
[58]	Ajabshir, V.B., Guzel, M.S. and Bostanci, E. (2022) A Low-Cost Q-Learning-Based Approach to Handle Continuous Space Problems for Decen-tralized Multi-Agent Robot Navigation in Cluttered Environments. IEEE Access, 10, 35287-35301. https://doi.org/10.1109/access.2022.3163393
[59]	Jang, B., Kim, M., Harerimana, G. and Kim, J.W. (2019) Q-Learning Algorithms: A Comprehensive Classification and Applications. IEEE Access, 7, 133653-133667. https://doi.org/10.1109/access.2019.2941229
[60]	Jorgensen, S.C.J., Stew-art, J.J. and Dalton, B.R. (2021) The Case for “Conservative Pharmacotherapy”. Journal of Antimicrobial Chemotherapy, 76, 1658-1660. https://doi.org/10.1093/jac/dkab011
[61]	Maron, D.J., Hochman, J.S., Reyn-olds, H.R., et al. (2020) Initial Invasive or Conservative Strategy for Stable Cor-onary Disease. New England Journal of Medicine, 382, 1395-1407.
[62]	Nandi, A., Beard, J.R. and Galea, S. (2009) Epidemiologic Heterogeneity of Common Mood and Anxiety Disorders over the Life Course in the General Population: A Systematic Review. BMC Psychiatry, 9, Article No. 31. https://doi.org/10.1186/1471-244x-9-31
[63]	Csajka, C. and Verotta, D. (2006) Pharmacokinetic-Pharmacodynamic Modelling: History and Perspec-tives. Journal of Pharmacokinetics and Pharmacodynamics, 33, 227-279. https://doi.org/10.1007/s10928-005-9002-0
[64]	Gruwez, B., Poirier, M., Dauphin, A., Olié, J. and Tod, M. (2007) A Kinetic-Pharmacodynamic Model for Clinical Trial Simulation of Antidepressant Action: Application to Clomipra-mine-Lithium Interaction. Contemporary Clinical Trials, 28, 276-287. https://doi.org/10.1016/j.cct.2006.09.001
[65]	Sheiner, L.B. and Steimer, J. (2000) Pharmacokinetic/Pharmacodynamic Modeling in Drug Development. Annual Review of Pharmacology and Toxicology, 40, 67-95. https://doi.org/10.1146/annurev.pharmtox.40.1.67

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133