Detecting behavioural signatures of depression from everyday digital traces is a central challenge in computational psychiatry. Real-world datasets from smartphones and wearables often suffer from sparse labels, heterogeneous sampling, and highly imbalanced case control ratios, limiting the development of robust models. To explore these challenges under controlled conditions, we construct a clinically inspired synthetic dataset of daily behavioural trajectories for 200 virtual subjects monitored over 30 days. For each subject day, we simulate multivariate digital phenotyping features including sleep duration, physical activity, social interactions, and diurnal phone usage. Subject-level depression labels are defined via PHQ-9 score distributions aligned with standard clinical thresholds. We then evaluate whether a light-weight self-supervised learning (SSL) encoder can derive latent representations that differentiate depressed from healthy subjects more effectively than naïve raw features. The SSL model is trained using a contrastive NT-Xent objective combined with a reconstruction term and operates on full 30-day sequences. The resulting embeddings are fed into multiple downstream classifiers (Random Forest, XGBoost, SVM, and Logistic Regression). Across all models, SSL features consistently outperform raw handcrafted aggregates in AUC, with clear improvements in discriminability and calibration. Behavioural distributions, temporal trajectories and correlations, classifier performance and ROC curves, weekly rhythms, cluster-level archetypes, and UMAP projections of the latent space jointly show that depression is expressed not as simple magnitude shifts in single features but as distributed, temporally structured deviations. This work contributes a mathematically explicit synthetic benchmark and demonstrates that compact SSL encoders can learn clinically meaningful representations of mental-health–related behaviour even in noisy, imbalanced settings, providing a foundation for future real-world digital phenotyping pipelines.
Cite this paper
Filippis, R. D. and Foysal, A. A. (2026). A Lightweight Self-Supervised Representation Learning Framework for Depression Risk Profiling from Synthetic Daily Behavioural Trajectories. Open Access Library Journal, 13, e14918. doi: http://dx.doi.org/10.4236/oalib.1114918.
Healy, D. and Williams, J.M. (1988) Dysrhythmia, Dysphoria, and Depression: The Interaction of Learned Helplessness and Circadian Dysrhythmia in the Pathogenesis of Depression. Psychological Bulletin, 103, 163-178.https://doi.org/10.1037//0033-2909.103.2.163
Ehlers, C.L., Frank, E. and Kupfer, D.J. (1988) Social Zeitgebers and Biological Rhythms: A Unified Approach to Understanding the Etiology of Depression. Archives of General Psychiatry, 45, 948-952.
Corfman, E.L. (1979) Depression, Manic-Depressive Illness, and Biological Rhythms. Vol. 1. Department of Health, Education, and Welfare, Pub-lic Health Service, Alcohol, Drug Abuse, and Mental Health Administration. Na-tional Institute of Mental Health, Division of Scientific and Public Infor-mation.
Breitinger, S., Gardea-Resendez, M., Langholm, C., Xiong, A., Laivell, J., Stoppel, C., et al. (2023) Digital Phenotyping for Mood Disorders: Methodology-Oriented Pilot Feasibility Study. Journal of Medical Internet Re-search, 25, e47006. https://doi.org/10.2196/47006
Marciano, L., Vocaj, E., Bekalu, M.A., La Tona, A., Rocchi, G. and Viswanath, K. (2023) The Use of Mobile Assessments for Monitoring Mental Health in Youth: Umbrella Review. Journal of Medical Internet Research, 25, e45540. https://doi.org/10.2196/45540
Harris, C., Tang, Y., Birnbaum, E., Cherian, C., Mendhe, D. and Chen, M.H. (2024) Digital Neuropsychology Be-yond Computerized Cognitive Assessment: Applications of Novel Digital Tech-nologies. Archives of Clinical Neuropsychology, 39, 290-304. https://doi.org/10.1093/arclin/acae016
De La Fabián, R., Jiménez-Molina, á. and Pizarro Obaid, F. (2023) A Critical Analysis of Digital Phenotyping and the Neuro-Digital Complex in Psychiatry. Big Data & Society, 10, Article 20539517221149047. https://doi.org/10.1177/20539517221149097
Dinga, R., Marquand, A.F., Veltman, D.J., Beekman, A.T.F., Schoevers, R.A., van Hemert, A.M., et al. (2018) Predicting the Naturalistic Course of Depression from a Wide Range of Clinical, Psychological, and Biological Data: A Machine Learning Approach. Translational Psychiatry, 8, Article No. 241. https://doi.org/10.1038/s41398-018-0289-1
Schmaal, L., Marquand, A.F., Rhebergen, D., van Tol, M., Ruhé, H.G., van der Wee, N.J.A., et al. (2015) Predicting the Naturalistic Course of Major Depressive Disorder Using Clinical and Multimodal Neuroimaging Information: A Multivariate Pattern Recognition Study. Biological Psychiatry, 78, 278-286. https://doi.org/10.1016/j.biopsych.2014.11.018
Cao, Y.C., Dai, J.L., Wang, Z.Y., et al. (2025) Machine Learning Approaches for Depression Detec-tion on Social Media: A Systematic Review of Biases and Methodological Chal-lenges. Journal of Behavioral Data Science, 5, 67-102. https://doi.org/10.35566/jbds/caoyc
Artitayaporn, R., Songpan, W. and Surinta, O. (2025) Depression Classification with Imbalanced Data Problems: Literature Survey. Engineering Access, 11, 185-199.
Onnela, J. and Rauch, S.L. (2016) Harnessing Smartphone-Based Digital Phenotyping to En-hance Behavioral and Mental Health. Neuropsychopharmacology, 41, 1691-1696. https://doi.org/10.1038/npp.2016.7
Aung, M.H., Mat-thews, M. and Choudhury, T. (2017) Sensing Behavioral Symptoms of Mental Health and Delivering Personalized Interventions Using Mobile Technologies. Depression and Anxiety, 34, 603-609. https://doi.org/10.1002/da.22646
Mohr, D.C., Zhang, M. and Schueller, S.M. (2017) Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning. Annual Review of Clinical Psychology, 13, 23-47. https://doi.org/10.1146/annurev-clinpsy-032816-044949
Wang, W.C., Harari, G.M., Wang, R., et al. (2018) Sensing Behavioral Change over Time: Us-ing Within-Person Variability Features from Mobile Sensing to Predict Personal-ity Traits. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiq-uitous Technologies, 2, 1-21.
Difrancesco, S., Riese, H., Merikangas, K.R., Shou, H., Zipunnikov, V., Antypa, N., et al. (2021) Sociodemographic, Health and Lifestyle, Sampling, and Mental Health Determinants of 24-Hour Motor Ac-tivity Patterns: Observational Study. Journal of Medical Internet Research, 23, e20700. https://doi.org/10.2196/20700
Lane, N.D., Bhattacharya, S., Georgiev, P., Forlivesi, C. and Kawsar, F. (2015) An Early Resource Characteri-zation of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices. Proceedings of the 2015 International Workshop on Internet of Things towards Applications, Seoul, 1 November 2015, 7-12. https://doi.org/10.1145/2820975.2820980
Dargazany, A.R., Stegagno, P. and Mankodiya, K. (2018) Wearabledl: Wearable Internet-of-Things and Deep Learning for Big Data Ana-lytics—Concept, Literature, and Future. Mobile Information Systems, 2018, 1-20. https://doi.org/10.1155/2018/8125126
Preuveneers, D. and Joosen, W. (2016) Privacy-Enabled Remote Health Monitoring Applications for Resource Constrained Wearable Devices. Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, 4-8 April 2016 , 119-124. https://doi.org/10.1145/2851613.2851683
D’Ambrosio, S., De Pasquale, S., Iannone, G., Malandrino, D., Negro, A., Patimo, G., et al. (2016) Energy Consumption and Privacy in Mobile Web Browsing: Individual Issues and Connected Solutions. Sustainable Computing: Informatics and Systems, 11, 63-79. https://doi.org/10.1016/j.suscom.2016.02.003
Jerónimo, J., de Antonio, A. and Moral, C. (2018) Architectural Challenges on the Analysis of Human Behaviour in Synthetic Environments. Proceedings of the 12th Europe-an Conference on Software Architecture: Companion Proceedings, Madrid, 24-28 September 2018, 1-7. https://doi.org/10.1145/3241403.3241441
De Landa, M. (1994) Vir-tual Environments and the Emergence of Synthetic Reason. In: Flame Wars, Duke University Press, 263-286. https://doi.org/10.2307/j.ctv1220m2w.15
Way, J.C., Collins, J.J., Keasling, J.D. and Silver, P.A. (2014) Integrating Biological Redesign: Where Synthetic Biology Came from and Where It Needs to Go. Cell, 157, 151-161. https://doi.org/10.1016/j.cell.2014.02.039
Wang, Y. (2025) One to Two, Two to All: Towards Multimodal Self-Supervised Learning for Earth Ob-servation. PhD Dissertation, Technische Universität München.
Ayuso-Mateos, J.L., Nuevo, R., Verdes, E., Naidoo, N. and Chatterji, S. (2010) From Depressive Symptoms to Depressive Disorders: The Relevance of Thresholds. British Journal of Psychiatry, 196, 365-371. https://doi.org/10.1192/bjp.bp.109.071191
Tusa, N., Koponen, H., Kautiainen, H., Korniloff, K., Raatikainen, I., Elfving, P., et al. (2019) The Pro-files of Health Care Utilization among a Non-Depressed Population and Patients with Depressive Symptoms with and without Clinical Depression. Scandinavian Journal of Primary Health Care, 37, 312-318. https://doi.org/10.1080/02813432.2019.1639904
Rubin, E.H., Veiel, L.L., Kinscherf, D.A., Morris, J.C. and Storandt, M. (2001) Clinically Significant De-pressive Symptoms and Very Mild to Mild Dementia of the Alzheimer Type. In-ternational Journal of Geriatric Psychiatry, 16, 694-701. https://doi.org/10.1002/gps.408
Daghistani, T. and Alshammari, R. (2020) Comparison of Statistical Logistic Regression and Randomforest Machine Learning Techniques in Predicting Diabetes. Journal of Advances in Information Technology, 11, 78-83. https://doi.org/10.12720/jait.11.2.78-83
Huang, J., Tsai, Y., Wu, P., Lien, Y., Chien, C., Kuo, C., et al. (2020) Predictive Modeling of Blood Pressure during Hemodialysis: A Comparison of Linear Model, Random Forest, Support Vector Regression, XGBoost, LASSO Regression and Ensemble Method. Comput-er Methods and Programs in Biomedicine, 195, Article 105536. https://doi.org/10.1016/j.cmpb.2020.105536
Maulina, M., Hiola, Y.P. and Alamudi, A. (2025) Performance Evaluation of Multinomial Logistic Re-gression, Random Forest, and XGBoost Methods in Data Classification. Journal of Mathematics, Computations and Statistics, 8, 355-373. https://doi.org/10.35580/jmathcos.v8i2.8459
Sitompul, L.R., Nababan, A.A., Manihuruk, M.L., Ponsen, W.A. and Supriyandi, S. (2025) Comparison of XGBoost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification. Sinkron, 9, 957-968. https://doi.org/10.33395/sinkron.v9i2.14794
Boer, Y., Valencia, L., Se-tiadi, M.R., Eka Setiawan, K. and Hasani, M.F. (2023) Classification of Heart Disease: Comparative Analysis Using KNN, Random Forest, Gaussian Naive Bayes, XGBoost, SVM, Decision Tree, and Logistic Regression. 2023 5th Interna-tional Conference on Cybernetics and Intelligent System (ICORIS), Pangkal-pinang, 6-7 October 2023, 1-5. https://doi.org/10.1109/icoris60118.2023.10352195
Snyder, C.K. and Chang, A. (2019) Mobile Technology, Sleep, and Circadian Disruption. In: Sleep and Health, Elsevier, 159-170. https://doi.org/10.1016/b978-0-12-815373-4.00013-7
Kim, Y., Kim, E., Lee, Y. and Park, J. (2025) Role of Late-Night Eating in Circadian Disruption and Depression: A Review of Emotional Health Impacts. Physical Activity and Nutrition, 29, 18-24. https://doi.org/10.20463/pan.2025.0003
Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., et al. (2022) Deep ROC Analysis and AUC as Balanced Average Accuracy, for Im-proved Classifier Selection, Audit and Explanation. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 45, 329-341. https://doi.org/10.1109/tpami.2022.3145392
Chicco, D. and Jurman, G. (2023) The Matthews Correlation Coefficient (MCC) Should Replace the ROC AUC as the Standard Metric for Assessing Binary Classification. BioData Mining, 16, Article No. 4. https://doi.org/10.1186/s13040-023-00322-4
Bader, M., Abdelwanis, M., Maalouf, M. and Jelinek, H.F. (2024) Detecting Depression Se-verity Using Weighted Random Forest and Oxidative Stress Biomarkers. Scien-tific Reports, 14, Article No. 16328. https://doi.org/10.1038/s41598-024-67251-y
Jie, N., Zhu, M., Ma, X., Osuch, E.A., Wammes, M., Theberge, J., et al. (2015) Discriminating Bipolar Disorder from Major Depression Based on SVM-Foba: Efficient Feature Selection with Multimodal Brain Imaging Data. IEEE Transactions on Autonomous Mental Development, 7, 320-331. https://doi.org/10.1109/tamd.2015.2440298
Kaspar, M. (2019) Ena-bling Feature-Level Interpretability in Non-Linear Latent Variable Models: A Synthesis of Statistical and Machine Learning Techniques. PhD Dissertation, University of Oxford.
Oldfield, J., Tzelepis, C., Panagakis, Y., Nicolaou, M.A. and Patras, I. (2024) Bilinear Models of Parts and Appearances in Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine In-telligence, 46, 8568-8579. https://doi.org/10.1109/TPAMI.2024.3415506
Arafat, J., Tasmin, F. and Poudel, S. (2025) Feature Selection and Regularization in Multi-Class Classifica-tion: An Empirical Study of One-vs-Rest Logistic Regression with Gradient De-scent Optimization and L1 Sparsity Constraints.
De Bernardinis, M., Violi, V., Roncoroni, L., Boselli, A.S., Giunta, A. and Peracchia, A. (1999) Discriminant Power and Information Content of Ranson’s Prognostic Signs in Acute Pancrea-titis: A Meta-Analytic Study. Critical Care Medicine, 27, 2272-2283. https://doi.org/10.1097/00003246-199910000-00035
Davy, R., Esau, I., Chernokulsky, A., Outten, S. and Zilitinkevich, S. (2017) Diurnal Asymmetry to the Observed Global Warming. International Journal of Climatology, 37, 79-93. https://doi.org/10.1002/joc.4688
Chulliat, A., Blanter, E., Le Mouël, J.L. and Shnirman, M. (2005) On the Seasonal Asymmetry of the Diurnal and Sem-idiurnal Geomagnetic Variations. Journal of Geophysical Research: Space Phys-ics, 110, A5.
Long, K.M. and Meadows, G.N. (2018) Simulation Modelling in Mental Health: A Systematic Re-view. Journal of Simulation, 12, 76-85. https://doi.org/10.1057/s41273-017-0062-0
Almeda, N., Gar-cía-Alonso, C.R., Salinas-Pérez, J.A., Gutiérrez-Colosía, M.R. and Salva-dor-Carulla, L. (2019) Causal Modelling for Supporting Planning and Manage-ment of Mental Health Services and Systems: A Systematic Review. International Journal of Environmental Research and Public Health, 16, Article 332. https://doi.org/10.3390/ijerph16030332
Onnela, J. (2021) Opportuni-ties and Challenges in the Collection and Analysis of Digital Phenotyping Data. Neuropsychopharmacology, 46, 45-54. https://doi.org/10.1038/s41386-020-0771-3
Mar-tinez-Martin, N., Insel, T.R., Dagum, P., Greely, H.T. and Cho, M.K. (2018) Data Mining for Health: Staking Out the Ethical Territory of Digital Phenotyping. npj Digital Medicine, 1, Article No. 68. https://doi.org/10.1038/s41746-018-0075-8
Boche, H., Fono, A. and Kutyniok, G. (2024) A Mathematical Framework for Computability Aspects of Algorithmic Transparency. 2024 IEEE International Symposium on Information Theory (ISIT), Athens, 7-12 July 2024, 3089-3094. https://doi.org/10.1109/isit57864.2024.10619190
Vershynin, R. (2020) Collaborative Research: A Mathematical Framework for Generating Synthetic Data. NSF Award Number 2027299. Directorate for Mathematical and Physical Sciences, 20, Article 27299.
Strohmer, T. (2020) ATD: A Math-ematical Framework for Generating Synthetic Data. NSF Award Number 2027248. Directorate for Mathematical and Physical Sciences, 20, Article 27248.
Li, Z.Z., Zhao, K., Chen, P.D., Wang, D.W., et al. (2025) Disentangled Representation Learning for Capturing Individualized Brain Atrophy via Pseudo-Healthy Synthesis. IEEE Journal of Biomedical and Health Informatics, 29, 5056-5068. https://doi.org/10.1109/jbhi.2025.3543218