All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99


Relative Articles


The Effect of Speech Fragmentation and Audio Encodings on Automatic Parkinson’s Disease Recognition

DOI: 10.4236/jbise.2022.151002, PP. 6-25

Keywords: Parkinson’s Disease, Speech, Support Vector Machine, Neural Network, i-Vector, x-Vector

Full-Text   Cite this paper   Add to My Lib


Parkinson’s disease is a neurological disease which is incurable according to current clinical knowledge. Therefore, early detection and provision of appropriate treatment are of primary importance. Speech is one of the biomarkers that enable the detection of Parkinson’s disease affection. Numerous researches are based on recordings from controlled environments; nonetheless fewer apply real circumstances. In the present study, three objectives were examined: recording fragmentation (paragraph, sentences, time-based), variable encodings (Pulse-Code Modulation [PCM], GSM-Full Rate [FR], G.723.1) and majority voting on 8 kHz records using multiple classifiers. Support Vector Machine (SVM), Long Short-Term Memory (LSTM), i-vector and x-vector classifiers were evaluated in contrast with SVM as baseline. The highest results in accuracy and F1-score were achieved using i-vector models. Although variable encodings generally caused decrease in Parkinson-disease recognition, decline was within 2% - 3% at best. Moreover, fragmentation did not yield a clear outcome though some classifiers performed with the very similar efficiency along the differently fragmented sets. Majority voting did produce a slight increase in classification performance compared to as if no aggregation is used.


[1]  Poewe, W., Seppi, K., Tanner, C.M., Halliday, G.M., Brundin, P., Volkmann, J., Schrag, A.-E. and Lang, A.E. (2017) Parkinson Disease. Nature Reviews. Disease Primers, 3, Article No. 17013.
[2]  Balestrino, R. and Schapira, A.H.V. (2020) Parkinson Disease. European Journal of Neurology, 27, 27-42.
[3]  Michael, E.J. and Matthew, J.F. (2017) Current Approaches to the Treatment of Parkinson’s Disease. Bioorganic & Medicinal Chemistry Letters, 27, 4247-4255.
[4]  Gage, H. and Storey, L. (2004) Rehabilitation for Parkinson’s Disease: A Systematic Review of Available Evidence. Clinical Rehabilitation, 18, 463-482.
[5]  Sapir, S., Ramig, L. and Fox, C. (2008) Speech and Swallowing Disorders in Parkinson Disease. Current Opinion in Otolaryngology & Head and Neck Surgery, 16, 205-210.
[6]  Klumpp, P., Janu, T., Arias-Vergara, T., Vásquez-Correa, J.C., Orozco-Arroyave, J.R. and Nöth, E. (2017) Apkinson—A Mobile Monitoring Solution for Parkinson’s Disease. Interspeech 2017, Stockholm, 20-24 August 2017, 1839-1843.
[7]  Vasquez-Correa, J.C., Arias-Vergara, T., Orozco-Arroyave, J.R., Eskofier, B., Klucken, J. and Noth, E. (2019) Multimodal Assessment of Parkinson’s Disease: A Deep Learning Approach. IEEE Journal of Biomedocal and Health Informatics, 23, 1618-1630.
[8]  Dromey, C., Ramig, L.O. and Johnson, A.B. (1995) Phonatory and Articulatory Changes Associated with Increased Vocal Intensity in Parkinson Disease: A Case Study. Journal of Speech, Language, and Hearing Research, 38, 751-764.
[9]  Novotny, M., Rusz, J., Čmejla, R. and Růžička, E. (2014) Automatic Evaluation of Articulatory Disorders in Parkinson’s Disease. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 1366-1378.
[10]  Sztahó, D., Valálik, I. and Vicsi, K. (2019) Parkinson’s Disease Severity Estimation on Hungarian Speech Using Various Speech Tasks. 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, 10-12 October 2019, 1-6.
[11]  Kiss, G., Takács, A.B., Sztahó, D. and Vicsi, K. (2018) Detection Possibilities of Depression and Parkinson’s Disease Based on the Ratio of Transient Parts of the Speech. 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Budapest, 22-24 August 2018, 165-168.
[12]  Sztahó, D., Tulics, M.G., Vicsi, K. and Valálik, I. (2017) Automatic Estimation of Severity of Parkinson’s Disease Based on Speech Rhythm Related Features. 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Debrecen, 11-14 September 2017, 11-16.
[13]  Jinee, G., Padmavati, K. and Trilok, C.A. (2020) Classification, Prediction, and Monitoring of Parkinson’s Disease Using Computer Assisted Technologies: A Comparative Analysis. Engineering Applications of Artificial Intelligence, 96, Article ID: 103955.
[14]  Vasquez-Correa, J.C., Serra, J., Orozco-Arroyave, J.R., Vargas-Bonilla, J.F. and Noth, E. (2017) Effect of Acoustic Conditions on Algorithms to Detect Parkinson’s Disease from Speech. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, 5-9 March 2017, 5065-5069.
[15]  Arora, S., Baghai-Ravary, L. and Tsanas, A. (2019) Developing a Large Scale Population Screening Tool for the Assessment of Parkinson’s Disease Using Telephone-Quality Voice. The Journal of the Acoustical Society of America, 145, 2871-2884.
[16]  Jeancolas, L., Mangone, G., Corvol, J.-C., Vidailhet, M., Lehéricy, S., Benkelfat, B.-E., Benali, H. and Petrovska-Delacrétaz, D. (2019) Comparison of Telephone Recordings and Professional Microphone Recordings for Early Detection of Parkinson’s Disease, Using Mel-Frequency Cepstral Coefficients with Gaussian Mixture Models. Interspeech 2019, Graz, 15-19 September 2019, 3033-3037.
[17]  Hoehn, M.M. and Yahr, M.D. (1967) Parkinsonism: Onset, Progression and Mortality. Neurology, 17, 427-442.
[18]  Boersma, P. and van Heuven, V. (2001) Praat, a System for Doing Phonetics by Computer. GLOT International, 5, 341-345.
[19]  Peter, K. G.723.1 Speech Coder and Decoder.
[20]  Lenain, R., Weston, J., Shivkumar, A. and Fristed, E. (2020) Surfboard: Audio Feature Extraction for Modern Machine Learning. Interspeech, Shanghai, 25-29 October 2020, 2917-2921.
[21]  Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P. and Ouellet, P. (2011) Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Audio, Speech, and Language Processing, 19, 788-798.
[22]  Campbell, W.M., Sturim, D.E. and Reynolds, D.A. (2006) Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters, 13, 308-311.
[23]  Reynolds, D.A. and Rose, R.C. (1995) Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing, 3, 72-83.
[24]  Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. and Khudanpur, S. (2018) X-Vectors: Robust DNN Embeddings for Speaker Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 15-20 April 2018, 5329-5333.
[25]  Ioffe, S. (2006) Probabilistic Linear Discriminant Analysis. In: Leonardis, A., Bischof, H. and Pinz, A., Eds., Computer Vision—ECCV 2006, Lecture Notes in Computer Science, Vol. 3954, Springer, Berlin, 531-542.
[26]  Sztahó, D., Kiss, G. and Tulics, M.G. (2021) Deep Learning Solution for Pathological Voice Detection Using LSTM-Based Autoencoder Hybrid with Multi-Task Learning. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC), Vienna, 11-13 February 2021, 135-141.
[27]  Bind, S., Tiwari, A.K. and Sahani, A.K. (2015) A Survey of Machine Learning Based Approaches for Parkinson Disease Prediction. International Journal of Computer Science and Information Technologies, 6, 1648-1655.
[28]  Sarkar, A.K., Matrouf, D., Bousquet, P.M. and Bonastre, J.-F. (2001) Study of the Effect of i-Vector Modeling on Short and Mismatch Utterance Duration for Speaker Verification. 13th Annual Conference of the International Speech Communication Association, Portland, 9-13 September 2012, 2662-2665.


comments powered by Disqus

Contact Us


WhatsApp +8615387084133

WeChat 1538708413