%0 Journal Article
%T Reinforcement Learning-Based Personalized Mood Stabilizer Dosage Optimization
%A Rocco de Filippis
%A Abdullah Al Foysal
%J Open Access Library Journal
%V 13
%N 3
%P 1-21
%@ 2333-9721
%D 2026
%I Open Access Library
%R 10.4236/oalib.1114925
%X Personalized dosing of mood stabilizers remains challenging due to substantial inter-individual variability in symptom severity, treatment responsiveness, and vulnerability to adverse effects. Clinical titration is often slow and heuristic, motivating data-driven strategies that can adapt dosage over time while balancing efficacy and safety. In this study, we formulate mood stabilizer dosing as a finite-horizon Markov decision process and develop a reinforcement learning (RL) framework for patient-specific dosage adjustment. We simulate a heterogeneous cohort of 500 patients using clinically inspired latent factors depression severity, anxiety level, treatment responsiveness, and side-effect sensitivity and define mood trajectories through a nonlinear dose response model cantered on an individualized optimal dose. The RL agent selects among discrete dosage changes (&#8722;50, &#8722;25, 0,  25,  50 mg) across an 8-step titration horizon. A clinically motivated reward function encourages proximity to a target mood (50) and occupancy of a therapeutic range (45 - 55), while penalizing side effects and abrupt dose fluctuations and rewarding sustained stability. A tabular Q-learning agent trained for 1000 episodes achieved stable convergence with consistently positive episode returns (mean reward Ёж 63.6). In population-level evaluation over 200 held-out patients, the learned policy improved final mood by  20.3 points on average, with 68.5% of patients showing positive improvement. However, sustained stabilization remained limited: mean time spent in the therapeutic range was 16.8%, only 12.5% of patients achieved Ён50% occupancy, and 9.5% ended within range. Dose selection accuracy was modest, with 15.0% of final doses within 20% of the patient-specific optimum. Subgroup analysis indicated reduced stabilization performance in high-depression patients. Overall, results demonstrate the feasibility of RL for adaptive psychiatric dosing in a controlled simulation and highlight key state abstraction, horizon length, and function approximation that must be addressed before translational evaluation on real-world clinical data.
%K Reinforcement Learning
%K Translational Evaluation
%K Clinical Data
%U http://www.oalib.com/paper/6888130