%0 Journal Article %T AI-Based Early Detection of Alzheimer¡¯s Disease through Speech and Language Biomarkers: A Synthetic Proof-of-Concept Study %A Rocco de Filippis %A Abdullah Al Foysal %J Open Access Library Journal %V 13 %N 1 %P 1-19 %@ 2333-9721 %D 2026 %I Open Access Library %R 10.4236/oalib.1114443 %X Early detection of Alzheimer¡¯s disease (AD) is a critical yet unresolved challenge in neurology, as subtle cognitive and linguistic impairments often emerge years before formal diagnosis. Traditional approaches, including neuroimaging and cognitive testing, are limited by cost, invasiveness, and low sensitivity at prodromal stages. Speech and language markers have recently emerged as promising, non-invasive digital biomarkers that can be continuously monitored in naturalistic settings. In this study, we present a proof-of-concept framework that leverages natural language processing (NLP) techniques for automated early AD detection using synthetic speech transcripts. We generated a balanced dataset of 440 samples (220 healthy controls, 220 early AD-like) designed to capture hallmark linguistic alterations associated with AD, including reduced lexical diversity, shorter sentence length, excessive pronoun use, semantic drift, and increased occurrence of fillers and pauses. Each transcript was processed into two complementary feature sets: (i) term frequency-inverse document frequency (TF-IDF) representations of unigrams and bigrams, and (ii) engineered linguistic biomarkers such as type-token ratio, idea density, repetition rate, pronoun ratio, and Flesch reading ease. A logistic regression classifier trained on the combined features achieved strong discriminative performance, with an area under the ROC curve (AUC) of 0.87 and an average precision score of 0.84. Interpretability analysis revealed that features most predictive of AD closely aligned with known linguistic deficits, including filler frequency and pronoun ratio, while lexical diversity and syntactic complexity protected against misclassification. Although this study relies on synthetic data, the framework establishes a transparent, reproducible methodology for integrating speech-based biomarkers into digital phenotyping pipelines. These findings highlight the potential of language analysis for scalable, non-invasive early detection of AD, motivating future validation on real patient cohorts.
%K Alzheimer¡¯s Disease %K Early Detection %K Digital Biomarkers %K Speech Analysis %K Natural Language Processing %K Machine Learning %K Linguistic Biomarkers %K Cognitive Decline %U http://www.oalib.com/paper/6877562