A Lightweight Self-Supervised Representation Learning Framework for Depression Risk Profiling from Synthetic Daily Behavioural Trajectories

doi:10.4236/oalib.1114918

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 13 2026

查看所有领域

A Lightweight Self-Supervised Representation Learning Framework for Depression Risk Profiling from Synthetic Daily Behavioural Trajectories

DOI: 10.4236/oalib.1114918, PP. 1-20

Rocco de Filippis,Abdullah Al Foysal

Subject Areas: Artificial Intelligence, Psychiatry & Psychology

Keywords: Digital Phenotyping, Self-Supervised Learning, Depression Detection, Synthetic Behavioural Data, Temporal Representation Learning, Lightweight Deep Models

Full-Text Cite this paper Add to My Lib

Abstract

Detecting behavioural signatures of depression from everyday digital traces is a central challenge in computational psychiatry. Real-world datasets from smartphones and wearables often suffer from sparse labels, heterogeneous sampling, and highly imbalanced case control ratios, limiting the development of robust models. To explore these challenges under controlled conditions, we construct a clinically inspired synthetic dataset of daily behavioural trajectories for 200 virtual subjects monitored over 30 days. For each subject day, we simulate multivariate digital phenotyping features including sleep duration, physical activity, social interactions, and diurnal phone usage. Subject-level depression labels are defined via PHQ-9 score distributions aligned with standard clinical thresholds. We then evaluate whether a light-weight self-supervised learning (SSL) encoder can derive latent representations that differentiate depressed from healthy subjects more effectively than naïve raw features. The SSL model is trained using a contrastive NT-Xent objective combined with a reconstruction term and operates on full 30-day sequences. The resulting embeddings are fed into multiple downstream classifiers (Random Forest, XGBoost, SVM, and Logistic Regression). Across all models, SSL features consistently outperform raw handcrafted aggregates in AUC, with clear improvements in discriminability and calibration. Behavioural distributions, temporal trajectories and correlations, classifier performance and ROC curves, weekly rhythms, cluster-level archetypes, and UMAP projections of the latent space jointly show that depression is expressed not as simple magnitude shifts in single features but as distributed, temporally structured deviations. This work contributes a mathematically explicit synthetic benchmark and demonstrates that compact SSL encoders can learn clinically meaningful representations of mental-health–related behaviour even in noisy, imbalanced settings, providing a foundation for future real-world digital phenotyping pipelines.

Cite this paper

Filippis, R. D. and Foysal, A. A. (2026). A Lightweight Self-Supervised Representation Learning Framework for Depression Risk Profiling from Synthetic Daily Behavioural Trajectories. Open Access Library Journal, 13, e14918. doi: http://dx.doi.org/10.4236/oalib.1114918.

References

[1]	Healy, D. and Williams, J.M. (1988) Dysrhythmia, Dysphoria, and Depression: The Interaction of Learned Helplessness and Circadian Dysrhythmia in the Pathogenesis of Depression. Psychological Bulletin, 103, 163-178.https://doi.org/10.1037//0033-2909.103.2.163
[2]	Ehlers, C.L., Frank, E. and Kupfer, D.J. (1988) Social Zeitgebers and Biological Rhythms: A Unified Approach to Understanding the Etiology of Depression. Archives of General Psychiatry, 45, 948-952.
[3]	Sandra, C., Panzarella, G., Firth, J., et al. (2025) Decoding Depression: Exploring the Environment across Life Course.
[4]	Corfman, E.L. (1979) Depression, Manic-Depressive Illness, and Biological Rhythms. Vol. 1. Department of Health, Education, and Welfare, Pub-lic Health Service, Alcohol, Drug Abuse, and Mental Health Administration. Na-tional Institute of Mental Health, Division of Scientific and Public Infor-mation.
[5]	Stark, K.D., Bronik, M.D., Wong, S., Wells, G. and Ostrander, R. (2000) Depressive Disorders. Advanced Abnormal Child Psychology, 2, 291-326.
[6]	Breitinger, S., Gardea-Resendez, M., Langholm, C., Xiong, A., Laivell, J., Stoppel, C., et al. (2023) Digital Phenotyping for Mood Disorders: Methodology-Oriented Pilot Feasibility Study. Journal of Medical Internet Re-search, 25, e47006. https://doi.org/10.2196/47006
[7]	Marciano, L., Vocaj, E., Bekalu, M.A., La Tona, A., Rocchi, G. and Viswanath, K. (2023) The Use of Mobile Assessments for Monitoring Mental Health in Youth: Umbrella Review. Journal of Medical Internet Research, 25, e45540. https://doi.org/10.2196/45540
[8]	Harris, C., Tang, Y., Birnbaum, E., Cherian, C., Mendhe, D. and Chen, M.H. (2024) Digital Neuropsychology Be-yond Computerized Cognitive Assessment: Applications of Novel Digital Tech-nologies. Archives of Clinical Neuropsychology, 39, 290-304. https://doi.org/10.1093/arclin/acae016
[9]	Stephan, C., Berckmans, D., Geris, L., et al. (2019) Mobile Health Revolution in Healthcare: Are We Ready?
[10]	De La Fabián, R., Jiménez-Molina, á. and Pizarro Obaid, F. (2023) A Critical Analysis of Digital Phenotyping and the Neuro-Digital Complex in Psychiatry. Big Data & Society, 10, Article 20539517221149047. https://doi.org/10.1177/20539517221149097
[11]	Dinga, R., Marquand, A.F., Veltman, D.J., Beekman, A.T.F., Schoevers, R.A., van Hemert, A.M., et al. (2018) Predicting the Naturalistic Course of Depression from a Wide Range of Clinical, Psychological, and Biological Data: A Machine Learning Approach. Translational Psychiatry, 8, Article No. 241. https://doi.org/10.1038/s41398-018-0289-1
[12]	Schmaal, L., Marquand, A.F., Rhebergen, D., van Tol, M., Ruhé, H.G., van der Wee, N.J.A., et al. (2015) Predicting the Naturalistic Course of Major Depressive Disorder Using Clinical and Multimodal Neuroimaging Information: A Multivariate Pattern Recognition Study. Biological Psychiatry, 78, 278-286. https://doi.org/10.1016/j.biopsych.2014.11.018
[13]	Cao, Y.C., Dai, J.L., Wang, Z.Y., et al. (2025) Machine Learning Approaches for Depression Detec-tion on Social Media: A Systematic Review of Biases and Methodological Chal-lenges. Journal of Behavioral Data Science, 5, 67-102. https://doi.org/10.35566/jbds/caoyc
[14]	Artitayaporn, R., Songpan, W. and Surinta, O. (2025) Depression Classification with Imbalanced Data Problems: Literature Survey. Engineering Access, 11, 185-199.
[15]	Onnela, J. and Rauch, S.L. (2016) Harnessing Smartphone-Based Digital Phenotyping to En-hance Behavioral and Mental Health. Neuropsychopharmacology, 41, 1691-1696. https://doi.org/10.1038/npp.2016.7
[16]	Aung, M.H., Mat-thews, M. and Choudhury, T. (2017) Sensing Behavioral Symptoms of Mental Health and Delivering Personalized Interventions Using Mobile Technologies. Depression and Anxiety, 34, 603-609. https://doi.org/10.1002/da.22646
[17]	Mohr, D.C., Zhang, M. and Schueller, S.M. (2017) Personal Sensing: Understanding Mental Health Using Ubiquitous Sensors and Machine Learning. Annual Review of Clinical Psychology, 13, 23-47. https://doi.org/10.1146/annurev-clinpsy-032816-044949
[18]	Wang, W.C., Harari, G.M., Wang, R., et al. (2018) Sensing Behavioral Change over Time: Us-ing Within-Person Variability Features from Mobile Sensing to Predict Personal-ity Traits. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiq-uitous Technologies, 2, 1-21.
[19]	Difrancesco, S., Riese, H., Merikangas, K.R., Shou, H., Zipunnikov, V., Antypa, N., et al. (2021) Sociodemographic, Health and Lifestyle, Sampling, and Mental Health Determinants of 24-Hour Motor Ac-tivity Patterns: Observational Study. Journal of Medical Internet Research, 23, e20700. https://doi.org/10.2196/20700
[20]	Lane, N.D., Bhattacharya, S., Georgiev, P., Forlivesi, C. and Kawsar, F. (2015) An Early Resource Characteri-zation of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices. Proceedings of the 2015 International Workshop on Internet of Things towards Applications, Seoul, 1 November 2015, 7-12. https://doi.org/10.1145/2820975.2820980
[21]	Hoang, M.L. (2025) A Comprehensive Review of Machine Learning, and Deep Learning in Wearable IoT Devices.
[22]	Dargazany, A.R., Stegagno, P. and Mankodiya, K. (2018) Wearabledl: Wearable Internet-of-Things and Deep Learning for Big Data Ana-lytics—Concept, Literature, and Future. Mobile Information Systems, 2018, 1-20. https://doi.org/10.1155/2018/8125126
[23]	Preuveneers, D. and Joosen, W. (2016) Privacy-Enabled Remote Health Monitoring Applications for Resource Constrained Wearable Devices. Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, 4-8 April 2016 , 119-124. https://doi.org/10.1145/2851613.2851683
[24]	D’Ambrosio, S., De Pasquale, S., Iannone, G., Malandrino, D., Negro, A., Patimo, G., et al. (2016) Energy Consumption and Privacy in Mobile Web Browsing: Individual Issues and Connected Solutions. Sustainable Computing: Informatics and Systems, 11, 63-79. https://doi.org/10.1016/j.suscom.2016.02.003
[25]	Jerónimo, J., de Antonio, A. and Moral, C. (2018) Architectural Challenges on the Analysis of Human Behaviour in Synthetic Environments. Proceedings of the 12th Europe-an Conference on Software Architecture: Companion Proceedings, Madrid, 24-28 September 2018, 1-7. https://doi.org/10.1145/3241403.3241441
[26]	De Landa, M. (1994) Vir-tual Environments and the Emergence of Synthetic Reason. In: Flame Wars, Duke University Press, 263-286. https://doi.org/10.2307/j.ctv1220m2w.15
[27]	Way, J.C., Collins, J.J., Keasling, J.D. and Silver, P.A. (2014) Integrating Biological Redesign: Where Synthetic Biology Came from and Where It Needs to Go. Cell, 157, 151-161. https://doi.org/10.1016/j.cell.2014.02.039
[28]	Wang, Y. (2025) One to Two, Two to All: Towards Multimodal Self-Supervised Learning for Earth Ob-servation. PhD Dissertation, Technische Universität München.
[29]	Tajamul, A. and Bashir, J. (2025) FATE: Focal-Modulated Attention Encoder for Multivariate Time-Series Forecasting.
[30]	Gu, H.X. (2025) AI for Medical Imaging: Founda-tion Models, 2D-to-3D Reconstruction, and Clinical Applications. PhD Disserta-tion, Duke University.
[31]	Ayuso-Mateos, J.L., Nuevo, R., Verdes, E., Naidoo, N. and Chatterji, S. (2010) From Depressive Symptoms to Depressive Disorders: The Relevance of Thresholds. British Journal of Psychiatry, 196, 365-371. https://doi.org/10.1192/bjp.bp.109.071191
[32]	Tusa, N., Koponen, H., Kautiainen, H., Korniloff, K., Raatikainen, I., Elfving, P., et al. (2019) The Pro-files of Health Care Utilization among a Non-Depressed Population and Patients with Depressive Symptoms with and without Clinical Depression. Scandinavian Journal of Primary Health Care, 37, 312-318. https://doi.org/10.1080/02813432.2019.1639904
[33]	Kline, E.R., Seid-man, L.J., Cornblatt, B.A., Woodberry, K.A., Bryant, C., Bearden, C.E., et al. (2018) Depression and Clinical High-Risk States: Baseline Presentation of De-pressed Vs. Non-Depressed Participants in the NAPLS-2 Cohort. Schizophrenia Research, 192, 357-363. https://doi.org/10.1016/j.schres.2017.05.032
[34]	Rubin, E.H., Veiel, L.L., Kinscherf, D.A., Morris, J.C. and Storandt, M. (2001) Clinically Significant De-pressive Symptoms and Very Mild to Mild Dementia of the Alzheimer Type. In-ternational Journal of Geriatric Psychiatry, 16, 694-701. https://doi.org/10.1002/gps.408
[35]	Daghistani, T. and Alshammari, R. (2020) Comparison of Statistical Logistic Regression and Randomforest Machine Learning Techniques in Predicting Diabetes. Journal of Advances in Information Technology, 11, 78-83. https://doi.org/10.12720/jait.11.2.78-83
[36]	Huang, J., Tsai, Y., Wu, P., Lien, Y., Chien, C., Kuo, C., et al. (2020) Predictive Modeling of Blood Pressure during Hemodialysis: A Comparison of Linear Model, Random Forest, Support Vector Regression, XGBoost, LASSO Regression and Ensemble Method. Comput-er Methods and Programs in Biomedicine, 195, Article 105536. https://doi.org/10.1016/j.cmpb.2020.105536
[37]	Maulina, M., Hiola, Y.P. and Alamudi, A. (2025) Performance Evaluation of Multinomial Logistic Re-gression, Random Forest, and XGBoost Methods in Data Classification. Journal of Mathematics, Computations and Statistics, 8, 355-373. https://doi.org/10.35580/jmathcos.v8i2.8459
[38]	Sitompul, L.R., Nababan, A.A., Manihuruk, M.L., Ponsen, W.A. and Supriyandi, S. (2025) Comparison of XGBoost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification. Sinkron, 9, 957-968. https://doi.org/10.33395/sinkron.v9i2.14794
[39]	Boer, Y., Valencia, L., Se-tiadi, M.R., Eka Setiawan, K. and Hasani, M.F. (2023) Classification of Heart Disease: Comparative Analysis Using KNN, Random Forest, Gaussian Naive Bayes, XGBoost, SVM, Decision Tree, and Logistic Regression. 2023 5th Interna-tional Conference on Cybernetics and Intelligent System (ICORIS), Pangkal-pinang, 6-7 October 2023, 1-5. https://doi.org/10.1109/icoris60118.2023.10352195
[40]	Snyder, C.K. and Chang, A. (2019) Mobile Technology, Sleep, and Circadian Disruption. In: Sleep and Health, Elsevier, 159-170. https://doi.org/10.1016/b978-0-12-815373-4.00013-7
[41]	Kim, Y., Kim, E., Lee, Y. and Park, J. (2025) Role of Late-Night Eating in Circadian Disruption and Depression: A Review of Emotional Health Impacts. Physical Activity and Nutrition, 29, 18-24. https://doi.org/10.20463/pan.2025.0003
[42]	Reich-enberger, D.A., Snyder, C.K. and Chang, A. (2026) Screen Use, Sleep, and Cir-cadian Disruption. In: Sleep and Health, Elsevier, 207-215. https://doi.org/10.1016/b978-0-443-13954-3.00018-8
[43]	Carrington, A.M., Manuel, D.G., Fieguth, P.W., Ramsay, T., Osmani, V., Wernly, B., et al. (2022) Deep ROC Analysis and AUC as Balanced Average Accuracy, for Im-proved Classifier Selection, Audit and Explanation. IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 45, 329-341. https://doi.org/10.1109/tpami.2022.3145392
[44]	Chicco, D. and Jurman, G. (2023) The Matthews Correlation Coefficient (MCC) Should Replace the ROC AUC as the Standard Metric for Assessing Binary Classification. BioData Mining, 16, Article No. 4. https://doi.org/10.1186/s13040-023-00322-4
[45]	Eve, R., Trevizani, R., Greenbaum, J.A., Carter, H., et al. (2023) The ROC-AUC Accu-rately Assesses Imbalanced Datasets.
[46]	Baxter, J.L. (1993) Aggregate Be-haviour. In: Behavioural Foundations of Economics, Palgrave Macmillan, 213-236. https://doi.org/10.1007/978-1-349-22627-6_12
[47]	Bader, M., Abdelwanis, M., Maalouf, M. and Jelinek, H.F. (2024) Detecting Depression Se-verity Using Weighted Random Forest and Oxidative Stress Biomarkers. Scien-tific Reports, 14, Article No. 16328. https://doi.org/10.1038/s41598-024-67251-y
[48]	Jie, N., Zhu, M., Ma, X., Osuch, E.A., Wammes, M., Theberge, J., et al. (2015) Discriminating Bipolar Disorder from Major Depression Based on SVM-Foba: Efficient Feature Selection with Multimodal Brain Imaging Data. IEEE Transactions on Autonomous Mental Development, 7, 320-331. https://doi.org/10.1109/tamd.2015.2440298
[49]	Kaspar, M. (2019) Ena-bling Feature-Level Interpretability in Non-Linear Latent Variable Models: A Synthesis of Statistical and Machine Learning Techniques. PhD Dissertation, University of Oxford.
[50]	Oldfield, J., Tzelepis, C., Panagakis, Y., Nicolaou, M.A. and Patras, I. (2024) Bilinear Models of Parts and Appearances in Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine In-telligence, 46, 8568-8579. https://doi.org/10.1109/TPAMI.2024.3415506
[51]	Arafat, J., Tasmin, F. and Poudel, S. (2025) Feature Selection and Regularization in Multi-Class Classifica-tion: An Empirical Study of One-vs-Rest Logistic Regression with Gradient De-scent Optimization and L1 Sparsity Constraints.
[52]	De Bernardinis, M., Violi, V., Roncoroni, L., Boselli, A.S., Giunta, A. and Peracchia, A. (1999) Discriminant Power and Information Content of Ranson’s Prognostic Signs in Acute Pancrea-titis: A Meta-Analytic Study. Critical Care Medicine, 27, 2272-2283. https://doi.org/10.1097/00003246-199910000-00035
[53]	Solberg, H.E. (1978) Discriminant Analysis. CRC Critical Reviews in Clinical Laboratory Sci-ences, 9, 209-242. https://doi.org/10.3109/10408367809150920
[54]	Tasche, D. (2009) Es-timating Discriminatory Power and PD Curves When the Number of Defaults Is Small.
[55]	Davy, R., Esau, I., Chernokulsky, A., Outten, S. and Zilitinkevich, S. (2017) Diurnal Asymmetry to the Observed Global Warming. International Journal of Climatology, 37, 79-93. https://doi.org/10.1002/joc.4688
[56]	Chulliat, A., Blanter, E., Le Mouël, J.L. and Shnirman, M. (2005) On the Seasonal Asymmetry of the Diurnal and Sem-idiurnal Geomagnetic Variations. Journal of Geophysical Research: Space Phys-ics, 110, A5.
[57]	Song, J. (1998) Diurnal Asymmetry in Surface Albedo. Ag-ricultural and Forest Meteorology, 92, 181-189. https://doi.org/10.1016/s0168-1923(98)00095-1
[58]	Long, K.M. and Meadows, G.N. (2018) Simulation Modelling in Mental Health: A Systematic Re-view. Journal of Simulation, 12, 76-85. https://doi.org/10.1057/s41273-017-0062-0
[59]	Almeda, N., Gar-cía-Alonso, C.R., Salinas-Pérez, J.A., Gutiérrez-Colosía, M.R. and Salva-dor-Carulla, L. (2019) Causal Modelling for Supporting Planning and Manage-ment of Mental Health Services and Systems: A Systematic Review. International Journal of Environmental Research and Public Health, 16, Article 332. https://doi.org/10.3390/ijerph16030332
[60]	Onnela, J. (2021) Opportuni-ties and Challenges in the Collection and Analysis of Digital Phenotyping Data. Neuropsychopharmacology, 46, 45-54. https://doi.org/10.1038/s41386-020-0771-3
[61]	Cohen, A.S., Cox, C.R., Masucci, M.D., Le, T.P., Cowan, T., Coghill, L.M., et al. (2020) Digital Phenotyp-ing Using Multimodal Data. Current Behavioral Neuroscience Reports, 7, 212-220. https://doi.org/10.1007/s40473-020-00215-4
[62]	Mar-tinez-Martin, N., Insel, T.R., Dagum, P., Greely, H.T. and Cho, M.K. (2018) Data Mining for Health: Staking Out the Ethical Territory of Digital Phenotyping. npj Digital Medicine, 1, Article No. 68. https://doi.org/10.1038/s41746-018-0075-8
[63]	Boche, H., Fono, A. and Kutyniok, G. (2024) A Mathematical Framework for Computability Aspects of Algorithmic Transparency. 2024 IEEE International Symposium on Information Theory (ISIT), Athens, 7-12 July 2024, 3089-3094. https://doi.org/10.1109/isit57864.2024.10619190
[64]	Vershynin, R. (2020) Collaborative Research: A Mathematical Framework for Generating Synthetic Data. NSF Award Number 2027299. Directorate for Mathematical and Physical Sciences, 20, Article 27299.
[65]	Strohmer, T. (2020) ATD: A Math-ematical Framework for Generating Synthetic Data. NSF Award Number 2027248. Directorate for Mathematical and Physical Sciences, 20, Article 27248.
[66]	Calhoun, Z.D., Lahrichi, S., Ren, S., Malof, J.M. and Bradbury, K. (2022) Self-Supervised Encoders Are Better Transfer Learners in Remote Sensing Applications. Remote Sensing, 14, Article 5500. https://doi.org/10.3390/rs14215500
[67]	Li, Z.Z., Zhao, K., Chen, P.D., Wang, D.W., et al. (2025) Disentangled Representation Learning for Capturing Individualized Brain Atrophy via Pseudo-Healthy Synthesis. IEEE Journal of Biomedical and Health Informatics, 29, 5056-5068. https://doi.org/10.1109/jbhi.2025.3543218

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133