AI-Based Early Detection of Alzheimer’s Disease through Speech and Language Biomarkers: A Synthetic Proof-of-Concept Study

doi:10.4236/oalib.1114443

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 13 2026

查看所有领域

AI-Based Early Detection of Alzheimer’s Disease through Speech and Language Biomarkers: A Synthetic Proof-of-Concept Study

DOI: 10.4236/oalib.1114443, PP. 1-19

Rocco de Filippis,Abdullah Al Foysal

Subject Areas: Neurology, Artificial Intelligence

Keywords: Alzheimer’s Disease, Early Detection, Digital Biomarkers, Speech Analysis, Natural Language Processing, Machine Learning, Linguistic Biomarkers, Cognitive Decline

Full-Text Cite this paper Add to My Lib

Abstract

Early detection of Alzheimer’s disease (AD) is a critical yet unresolved challenge in neurology, as subtle cognitive and linguistic impairments often emerge years before formal diagnosis. Traditional approaches, including neuroimaging and cognitive testing, are limited by cost, invasiveness, and low sensitivity at prodromal stages. Speech and language markers have recently emerged as promising, non-invasive digital biomarkers that can be continuously monitored in naturalistic settings. In this study, we present a proof-of-concept framework that leverages natural language processing (NLP) techniques for automated early AD detection using synthetic speech transcripts. We generated a balanced dataset of 440 samples (220 healthy controls, 220 early AD-like) designed to capture hallmark linguistic alterations associated with AD, including reduced lexical diversity, shorter sentence length, excessive pronoun use, semantic drift, and increased occurrence of fillers and pauses. Each transcript was processed into two complementary feature sets: (i) term frequency-inverse document frequency (TF-IDF) representations of unigrams and bigrams, and (ii) engineered linguistic biomarkers such as type-token ratio, idea density, repetition rate, pronoun ratio, and Flesch reading ease. A logistic regression classifier trained on the combined features achieved strong discriminative performance, with an area under the ROC curve (AUC) of 0.87 and an average precision score of 0.84. Interpretability analysis revealed that features most predictive of AD closely aligned with known linguistic deficits, including filler frequency and pronoun ratio, while lexical diversity and syntactic complexity protected against misclassification. Although this study relies on synthetic data, the framework establishes a transparent, reproducible methodology for integrating speech-based biomarkers into digital phenotyping pipelines. These findings highlight the potential of language analysis for scalable, non-invasive early detection of AD, motivating future validation on real patient cohorts.

Cite this paper

Filippis, R. D. and Foysal, A. A. (2026). AI-Based Early Detection of Alzheimer’s Disease through Speech and Language Biomarkers: A Synthetic Proof-of-Concept Study. Open Access Library Journal, 13, e14443. doi: http://dx.doi.org/10.4236/oalib.1114443.

References

[1]	Nandi, A., Counts, N., Chen, S., Seligman, B., Tortorice, D., Vigo, D., et al. (2022) Global and Regional Projections of the Economic Burden of Alzheimer’s Disease and Related Dementias from 2019 to 2050: A Value of Statistical Life Approach. eClinicalMedicine, 51, Article 101580. https://doi.org/10.1016/j.eclinm.2022.101580
[2]	Xiaopeng, Z., Jing, Y., Xia, L., Xingsheng, W., Juan, D., Yan, L., et al. (2025) Global Burden of Alzheimer’s Disease and Other Dementias in Adults Aged 65 Years and Older, 1991-2021: Population-Based Study. Frontiers in Public Health, 13, Article ID: 1585711. https://doi.org/10.3389/fpubh.2025.1585711
[3]	Alzheimer’s Association (2019) 2019 Alzheimer’s Disease Facts and Figures. Alzheimer’s & Dementia, 15, 321-387. https://doi.org/10.1016/j.jalz.2019.01.010
[4]	Twiss, E., McPherson, C. and Weaver, D.F. (2025) Global Diseases Deserve Global Solu-tions: Alzheimer’s Disease. Neurology International, 17, Article 92. https://doi.org/10.3390/neurolint17060092
[5]	Cacabelos, R. (2025) Spe-cial Issue: “New Trends in Alzheimer’s Disease Research: From Molecular Mechanisms to Therapeutics: 2nd Edition”. International Journal of Molecular Sciences, 26, Article 7175. https://doi.org/10.3390/ijms26157175
[6]	Mitchell, A.J., Kemp, S., Beni-to-León, J. and Reuber, M. (2010) The Influence of Cognitive Impairment on Health-Related Quality of Life in Neurological Disease. Acta Neuropsychiatrica, 22, 2-13. https://doi.org/10.1111/j.1601-5215.2009.00439.x
[7]	Memudu, A.E., Olukade, B.A. and Alex, G.S. (2024) Neurodegenerative Diseases. In: Chatterjee, I. and Moradikor, N., Eds., Integrating Neuroimaging, Computational Neuroscience, and Artificial Intelligence, CRC Press, 128-147. https://doi.org/10.1201/9781032711102-8
[8]	Landeiro, F., Mughal, S., Walsh, K., Nye, E., Morton, J., Williams, H., et al. (2020) Health-Related Quality of Life in People with Predementia Alzheimer’s Disease, Mild Cognitive Impair-ment or Dementia Measured with Preference-Based Instruments: A Systematic Literature Review. Alzheimer’s Research & Therapy, 12, Article No. 154. https://doi.org/10.1186/s13195-020-00723-1
[9]	Hyman, B.T. (1997) The Neuropathological Diagnosis of Alzheimer’s Disease: Clinical-Pathological Stud-ies. Neurobiology of Aging, 18, S27-S32. https://doi.org/10.1016/s0197-4580(97)00066-3
[10]	Dickson, D.W. (1997) Neuropathological Diagnosis of Alzheimer’s Disease: A Perspective from Longi-tudinal Clinicopathological Studies. Neurobiology of Aging, 18, S21-S26. https://doi.org/10.1016/s0197-4580(97)00065-1
[11]	DeTure, M.A. and Dickson, D.W. (2019) The Neuropathological Diagnosis of Alzheimer’s Disease. Molecular Neurodegeneration, 14, Article No. 32. https://doi.org/10.1186/s13024-019-0333-5
[12]	Sabbagh, M.N., Boada, M., Borson, S., Chilukuri, M., Doraiswamy, P.M., Dubois, B., et al. (2020) Rationale for Early Diagnosis of Mild Cognitive Impairment (MCI) Supported by Emerging Digital Technologies. The Journal of Prevention of Alzheimer’s Disease, 7, 158-164. https://doi.org/10.14283/jpad.2020.19
[13]	Tahami Monfared, A.A., Phan, N.T.N., Pearson, I., Mauskopf, J., Cho, M., Zhang, Q., et al. (2023) A Systematic Review of Clinical Practice Guidelines for Alzheimer’s Disease and Strategies for Future Advancements. Neurology and Therapy, 12, 1257-1284. https://doi.org/10.1007/s40120-023-00504-6
[14]	Hampel, H., Lista, S. and Khachaturian, Z.S. (2012) Development of Biomarkers to Chart All Alzheimer’s Disease Stages: The Royal Road to Cutting the Therapeutic Gordian Knot. Alz-heimer’s & Dementia, 8, 312-336. https://doi.org/10.1016/j.jalz.2012.05.2116
[15]	Werner, P., Barthel, H., Drzezga, A. and Sabri, O. (2015) Current Status and Future Role of Brain PET/MRI in Clinical and Research Settings. European Journal of Nuclear Medi-cine and Molecular Imaging, 42, 512-526. https://doi.org/10.1007/s00259-014-2970-9
[16]	Savitz, J.B., Rauch, S.L. and Drevets, W.C. (2013) Clinical Application of Brain Imaging for the Diagnosis of Mood Disorders: The Current State of Play. Molecular Psychiatry, 18, 528-539. https://doi.org/10.1038/mp.2013.25
[17]	Garcia, A. and Reilly, J. (2015) Linguistic Disruption in Primary Progressive Aphasia, Frontotemporal Degener-ation, and Alzheimer’s Disease. In Bahr, R.H. and Silliman, E.R., Eds., Routledge Handbook of Communication Disorders, Routledge, 268-277.
[18]	Na Chiangmai, N. (2023) Spontaneous Speech Analysis for Detect-Ing Mild Cogni-tive Impairment and Alzheimer’s Disease in Thai Older Adults. 1-236.
[19]	Kothinti, R.R. (2021) Advancements in Natural Language Pro-cessing for Auto-Mated Phenotyping and Predictive Analytics in Oncology EHRS. Iconic Research and Engineering Journals, 8, 245-252.
[20]	Noori, A., Magda-mo, C., Liu, X., Tyagi, T., Li, Z., Kondepudi, A., et al. (2022) Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phe-notyping of Cognitive Status in Electronic Health Records: Diagnostic Study. Journal of Medical Internet Research, 24, e40384. https://doi.org/10.2196/40384
[21]	Alfalahi, H., Dias, S.B., Khandoker, A.H., Chaudhuri, K.R. and Hadjileontiadis, L.J. (2023) A Scoping Review of Neuro-degenerative Manifestations in Explainable Digital Phenotyping. npj Parkinson’s Disease, 9, Article No. 49. https://doi.org/10.1038/s41531-023-00494-0
[22]	Zhou, Y., Lin, X., Zhang, X., Wang, M., Jiang, G., Lu, H., Wu, Y., et al. (2023) On the Opportunities of Green Computing: A Survey. arXiv:2311.00447.
[23]	Shaikh, S., Pereira, K.W., Sahay, S., Lopes, A. and Parshionikar, S. (2024) An Extensive Review: Models for Regional Language Speech Recognition. 2024 4th Asian Conference on In-novation in Technology (ASIANCON), Pimari Chinchwad, 23-25 August 2024, 1-8. https://doi.org/10.1109/asiancon62057.2024.10837903
[24]	Yu, D., Ju, Y., Wang, Y., Zweig, G. and Acero, A. (2007) Automated Directory Assistance System—From Theory to Practice. Interspeech 2007, Antwerp, 27-31 August 2007, 2709-2712. https://doi.org/10.21437/interspeech.2007-65
[25]	Kudapa, S.P. (2025) AI-Driven Data Science Models for Real-Time Transcription and Productivity Enhancement in U.S. Remote Work Environments. ASRC Procedia: Global Per-spectives in Science and Scholarship, 1, 801-832. https://doi.org/10.63125/gzyw2311
[26]	Ashurst, C. and Weller, A. (2023) Fairness without Demographic Data: A Survey of Approaches. Equity and Ac-cess in Algorithms, Mechanisms, and Optimization, Boston, 30 October 2023-1 November 2023, 1-12. https://doi.org/10.1145/3617694.3623234
[27]	Ramesh, K., Sitaram, S. and Choudhury, M. (2023) Fairness in Language Models Beyond English: Gaps and Challenges. Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, 2-6 May 2023, 2106-2119. https://doi.org/10.18653/v1/2023.findings-eacl.157
[28]	Jones, P., Liu, W., Huang, I. and Huang, X. (2025) Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models. 2025 IEEE 13th Inter-national Conference on Healthcare Informatics (ICHI), Rende, 18-21 June 2025, 58-68. https://doi.org/10.1109/ichi64645.2025.00016
[29]	AlSaad, R., Abd-alrazaq, A., Boughorbel, S., Ahmed, A., Renault, M., Damseh, R., et al. (2024) Multimodal Large Language Models in Health Care: Applications, Chal-lenges, and Future Outlook. Journal of Medical Internet Research, 26, e59505. https://doi.org/10.2196/59505
[30]	He, R., Chapin, K., Al-Tamimi, J., Bel, N., Marquié, M., Rosende-Roca, M., et al. (2023) Automated Classification of Cogni-tive Decline and Probable Alzheimer’s Dementia across Multiple Speech and Language Domains. American Journal of Speech-Language Pathology, 32, 2075-2086. https://doi.org/10.1044/2023_ajslp-22-00403
[31]	Orimaye, S.O., Wong, J.S., Golden, K.J., Wong, C.P. and Soyiri, I.N. (2017) Predicting Probable Alzheimer’s Disease Using Linguistic Deficits and Biomarkers. BMC Bi-oinformatics, 18, Article No. 34. https://doi.org/10.1186/s12859-016-1456-0
[32]	Li, C. (2024) Detecting Cognitive Impairment from Language and Speech for Early Screening of Alz-heimer’s Disease Dementia with Interpretable Transformer-Based Language Models. PhD Dissertation, University of Minnesota.
[33]	Uggen, T.K.E. (2020) The Use of Machine Learning Algorithms and Statistical Models to Classify Aphasia Severity. University of Technology Sydney (Australia).
[34]	Medero, J. (2014) Automatic Characterization of Text Difficulty. PhD Dissertation, Univer-sity of Washington.
[35]	Davis, B.H. and Maclagan, M. (2009) Examining Pauses in Alzheimer’s Discourse. American Journal of Alzheimer’s Disease & Other Dementias®, 24, 141-154. https://doi.org/10.1177/1533317508328138
[36]	Andreetta, S., Cantagallo, A. and Marini, A. (2012) Narrative Discourse in Anomic Aphasia. Neuropsy-chologia, 50, 1787-1793. https://doi.org/10.1016/j.neuropsychologia.2012.04.003
[37]	McCarthy, P.M. (2005) An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). PhD Dissertation, The University of Memphis.
[38]	Shlesinger, M. (1998) Cor-pus-Based Interpreting Studies as an Offshoot of Corpus-Based Translation Studies. Meta, 43, 486-493. https://doi.org/10.7202/004136ar
[39]	McNamara, D.S., Graesser, A.C., McCarthy, P.M. and Cai, Z. (2014) Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press. https://doi.org/10.1017/cbo9780511894664
[40]	Chou, C., Chang, C., Chang, Y., Lee, C., Chuang, Y., Chiu, Y., et al. (2024) Screening for Early Alz-heimer’s Disease: Enhancing Diagnosis with Linguistic Features and Biomarkers. Frontiers in Aging Neuroscience, 16, Article ID: 1451326. https://doi.org/10.3389/fnagi.2024.1451326
[41]	Kavé, G. and Goral, M. (2017) Word Retrieval in Connected Speech in Alzheimer’s Disease: A Review with Meta-Analyses. Aphasiology, 32, 4-26. https://doi.org/10.1080/02687038.2017.1338663
[42]	Gagliardi, G. and Tamburini, F. (2021) Linguistic Biomarkers for the Detection of Mild Cognitive Impairment. Lingue e Linguaggio, 20, 3-31.
[43]	Nyongesa, C.A., Hogarth, M. and Pa, J. (2025) Artificial Intelligence-Driven Natural Language Processing for Identifying Linguistic Patterns in Alzheimer’s Disease and Mild Cognitive Im-pairment: A Study of Lexical, Syntactic, and Cohesive Features of Speech through Picture Description Tasks. Journal of Alzheimer’s Disease, 106, 120-138. https://doi.org/10.1177/13872877251339756
[44]	Rane, N., Choudhary, S. and Rane, J. (2023) Explainable Artificial Intelligence (XAI) in Healthcare: Interpretable Models for Clinical Decision Support. SSRN Electronic Journal, 17 p. https://doi.org/10.2139/ssrn.4637897
[45]	Valente, F., Paredes, S., Henriques, J., Rocha, T., de Carvalho, P. and Morais, J. (2022) In-terpretability, Personalization and Reliability of a Machine Learning Based Clin-ical Decision Support System. Data Mining and Knowledge Discovery, 36, 1140-1173. https://doi.org/10.1007/s10618-022-00821-8
[46]	Abbas, Q., Jeong, W. and Lee, S.W. (2025) Explainable AI in Clinical Decision Support Sys-tems: A Meta-Analysis of Methods, Applications, and Usability Challenges. Healthcare, 13, Article 2154. https://doi.org/10.3390/healthcare13172154
[47]	Hartsock, I. and Rasool, G. (2024) Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review. Frontiers in Artificial Intelligence, 7, Article ID: 1430984. https://doi.org/10.3389/frai.2024.1430984
[48]	Hüser, M. (2021) Machine Learning Approaches for Patient Monitoring in the Intensive Care Unit. PhD Dissertation, ETH Zurich.
[49]	Iriondo, C. (2021) Characterizing Pheno-types of Musculoskeletal Degeneration Using Medical Imaging and Deep Learn-ing. PhD Dissertation, University of California.
[50]	Hou, S., Wu, Y., Chen, K., Chang, T., Hsu, Y., Chuang, S., et al. (2022) Code-Switching Automatic Speech Recognition for Nursing Record Documentation: System Development and Evaluation. JMIR Nursing, 5, e37562. https://doi.org/10.2196/37562
[51]	KhudaBukhsh, A.R. (2024) Deceptively Simple: An Outsider’s Perspective on Natural Language Processing. AI Maga-zine, 45, 569-582. https://doi.org/10.1002/aaai.12204
[52]	Levy, J.J. and O’Malley, A.J. (2020) Don’t Dismiss Logistic Regression: The Case for Sensible Extraction of Interactions in the Era of Machine Learning. BMC Medical Re-search Methodology, 20, Article No. 171. https://doi.org/10.1186/s12874-020-01046-3
[53]	Sikdar, A., Liu, Y., Kedarisetty, S., Zhao, Y., Ahmed, A. and Behera, A. (2025) Interweaving In-sights: High-Order Feature Interaction for Fine-Grained Visual Recognition. In-ternational Journal of Computer Vision, 133, 1755-1779. https://doi.org/10.1007/s11263-024-02260-y
[54]	Pratap, V., Xu, Q., Sriram, A., Synnaeve, G. and Collobert, R. (2020) MLS: A Large-Scale Multilingual Da-taset for Speech Research. Interspeech 2020, Shanghai, 25-29 October 2020, 2757-2761. https://doi.org/10.21437/interspeech.2020-2826

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133