全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Investigation of Automatic Speech Recognition Systems via the Multilingual Deep Neural Network Modeling Methods for a Very Low-Resource Language, Chaha

DOI: 10.4236/jsip.2020.111001, PP. 1-21

Keywords: Automatic Speech Recognition, Multilingual DNN Modeling Methods, Basic Phone Acoustic Units, Rounded Phone Acoustic Units, Chaha

Full-Text   Cite this paper   Add to My Lib

Abstract:

Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems.

References

[1]  Besacier, L., Barnard, E., Karpov, A. and Schultz, T. (2014) Automatic Speech Recognition for Under-Resourced Languages: A Survey. Speech Communication, 56, 85-100.
https://doi.org/10.1016/j.specom.2013.07.008
[2]  Vu, N.T., Imseng, D., Povey, D., Motlícek, P., Schultz, T. and Bourlard, H. (2014) Multilingual Deep Neural Network Based Acoustic Modeling for Rapid Language Adaptation. Proceedings of International Conference on Acoustics, Speech and Signal Processing, Florence, 4-9 May 2014, 7639-7643.
https://doi.org/10.1109/ICASSP.2014.6855086
[3]  Chen, D. and Mak, B.K.-W. (2015) Multi-Task Learning of Deep Neural Networks for Low-Resource Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 1172-1183.
https://doi.org/10.1109/TASLP.2015.2422573
[4]  Ghahremani, P., Manohar, V., Hadian, H., Povey, D. and Khudanpur, S. (2017) Investigation of Transfer Learning for ASR Using LF-MMI Trained Neural Networks. Proceedings of Automatic Speech Recognition and Understanding, Okinawa, 16-20 December 2017, 279-286.
https://doi.org/10.1109/ASRU.2017.8268947
[5]  Huang, J., Li, J., Yu, D., Deng, L. and Gong, Y. (2013) Cross-Language Knowledge Transfer Using Multilingual Deep Neural Network with Shared Hidden Layers. Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vancouver, 26-31 May 2013, 7304-7308.
https://doi.org/10.1109/ICASSP.2013.6639081
[6]  Gales, M.J., Knill, K., Ragni, A. and Rath, S.P. (2014) Speech Recognition and Keyword Spotting for Low-Resource Languages: Babel Project Research at CUED. In: SLTU, ISCA, St Petersburg, 16-23.
[7]  Lin, C., Wang, Y., Chen, S. and Liao, Y. (2016) A Preliminary Study on Cross-Language Knowledge Transfer for Low-Resource Taiwanese Mandarin ASR. Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, Bali, 26-28 October 2016, 33-38.
https://doi.org/10.1109/ICSDA.2016.7918980
[8]  Miao, Y. and Metze, F. (2013) Improving Low-Resource CD-DNN-HMM Using Dropout and Multilingual DNN Training. Proceedings of Interspeech, Lyon, 2237-2241.
[9]  Imed, Z. (2014) Natural Language Processing of Semitic Languages. Springer-Verlag, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-45358-8
[10]  Abate, S.T. and Menzel, W. (2007) Automatic Speech Recognition for an Under-Resourced Language—Amharic. Proceedings of Interspeech, Antwerp, 1541-1544.
[11]  Tachbelie, M.Y., Abate, S.T., Besacier, L. and Rossato, S. (2012) Syllable-Based and Hybrid Acoustic Models for Amharic Speech Recognition. Proceedings of the 3rd Workshop on Spoken Language Technologies for Under-Resourced Languages, Cape Town, 7-9 May 2012, 5-10.
[12]  Tachbelie, M.T., Abate, S.T. and Besacier, L. (2014) Using Different Acoustic, Lexical and Language Modeling Units for ASR of an Under-Resourced Language-Amharic. Speech Communication, 56, 181-194.
https://doi.org/10.1016/j.specom.2013.01.008
[13]  Leslau, W. (1997) Chaha (Gurage) Phonology. In: Kaye, A.S., Ed., Phonologies of Asia and Africa, Vol. 2, Eisenbrauns, Winona Lake, 373-397.
[14]  Banksira, D.P. (2000) Sound Mutations: The Morphophonology of Chaha. John Benjamins Publ., Amsterdam.
https://doi.org/10.1075/z.93
[15]  Carolyn, M.F. (1986) Notes on the Phonology and Grammar of Chaha-Gurage. Journal of Ethiopian Studies, 19, 41-80.
[16]  Rose, S. (2000) Epenthesis Positioning and Syllable Contact in Chaha. Phonology, 17, 397-425.
https://doi.org/10.1017/S0952675701003931
[17]  Rose, S. (2007) Chaha (Gurage) Morphology. In: Kaye, A.S., Ed., Morphologies of Asia and Africa, Eisenbrauns, Winona Lake, 399-424.
[18]  Lewis, M.P. (2009) Ethnologue: Languages of the World. Sixteenth Edition, SIL International, Dallas.
http://www.ethnologue.com/16
[19]  Besacier, L., Le, V.-B., Boitet, C. and Berment, V. (2006) ASR and Translation for Under-Resourced Languages. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 14-19 May 2006, 1221-1224.
https://doi.org/10.1109/ICASSP.2006.1661502
[20]  Abate, S.T., Menzel, W. and Tafila, B. (2005) An Amharic Speech Corpus for Large Vocabulary Continuous Speech Recognition. Proceedings of Interspeech, Lisbon, 1601-1604.
[21]  Ko, T., Peddinti, V., Povey, D. and Khudanpur, S. (2015) Audio Augmentation for Speech Recognition. Proceedings of Interspeech, Dresden, 6-10 September 2015, 3586-3589.
[22]  Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N.K., Hannemann, M., Motlícek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G. and Vesely, K. (2011) The Kaldi Speech Recognition Toolkit. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, 11-15 December 2011.
[23]  Stolcke, A. (2002) SRILM—An Extensible Language-Modeling Toolkit. Proceedings of International Conference on Spoken Language Processing, Denver, 901-904.
[24]  Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., Wang, Y. and Khudanpur, S. (2016) Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI. Proceedings of Interspeech, San Francisco, 8-12 September 2016, 2751-2755.
https://doi.org/10.21437/Interspeech.2016-595

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133