All Title Author
Keywords Abstract


Resources for Development of Hindi Speech Synthesis System: An Overview

DOI: 10.4236/ojapps.2017.76020, PP. 233-241

Keywords: Speech, Database, Corpora, Lexicon, Speech Synthesis, Linguistics, Natural Language Processing

Full-Text   Cite this paper   Add to My Lib

Abstract:

Most of the information in digital world is accessible to few who can read or understand a particular language. The speech corpus acquisition is an essential part of all spoken technology systems. The quality and the volume of speech data in corpus directly affect the accuracy of the system. However, there are a lot of scopes to develop speech technology system using Hindi language which is spoken primarily in India. To achieve such an ambitious goal, the collection of standard database is a prerequisite. This paper summarizes the Hindi corpus and lexical resources being developed by various organizations across the country.

References

[1]  Dash, N.S. and Choudhary, B.B. (2011) Why Do We Need to Develop Corpora for Indian Languages? Proceedings of the International Conferences on SCALLA (Vol. 11), Bangalore.
[2]  Kishore, P., et al. (2012) The IIIT-H Indic Speech Databases. Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, 9-13 September 2012, 1-4.
[3]  Agrawal, S.S. (2010) Recent Developments in Speech Corpora in Indian languages: Country Report of India. Proceedings of O-COCOSDA 2010, Kathmandu, 25 November 2010.
[4]  Mukherjee, M. (1996) The First Computer in India. In: Banerjee, U., Ed., Computer Education in India—Past, Present and Future, Concept Publications, New Delhi, 13-16.
[5]  Sinha, M.K. (2009) A Journey from Indian Scripts Processing to Indian Language Processing. IEEE Annals of the History of Computing, 31, 8-31.
https://doi.org/10.1109/MAHC.2009.1
[6]  Hindi Universal Word (UW) Dictionary.
http://www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php
[7]  Rao, S. (2011) Application Prosody Model for Developing Speech System. International Journal of Speech Technology, 11, 2011.
[8]  Quasthoff, U., Mitra, R., Mitra, S., Eckart, T., Goldhahn, D., Goyal, P. and Mukherjee, A. (2012) Large Web Corpora of High Quality for Indian Languages. Proceedings of the 8th International Conference on Language Resources and Evaluation (LERC), Istanbul, 21-27 May 2012, 47.
[9]  Kurian, C. (2015) A Review on Speech Corpus Development for Automatic Speech Recognition in Indian Languages. International Journal of Advanced Networking and Applications, 6, 2556.
[10]  Arora, S., Saxena, B., Arora, K. and Agarwal, S.S. (2010) Hindi ASR for Travel Domain. Proceedings of O-COCOSDA 2010, Kathmandu, 25 November 2010.
[11]  Agrawal, S.S. (2010) Recent Developments in Speech Corpora in Indian Languages: Country Report of India. Proceedings of O-COCOSDA 2010, Kathmandu, 25 November 2010.
[12]  Linguistic Data Consortium for Indian Languages (LDC-IL).
http://www.ldcil.org/resourcesSpeechCorpHindi.aspx
[13]  Samudravijay, K., Rao, P.V.S. and Agrawal, S.S. (2000) Hindi Speech Data. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP), Beijing, 16-20 October 2000.
[14]  Agrawal, S.S., Sinha, S., Singh, P. and Olsen, J. (2012) Development of Text and Speech Database for Hindi and Indian English Specific to Mobile Communication Environment. Proceedings of the International Conference on the Language Resources and Evaluation Conference (LREC), Istanbul, 21-27 May 2012.
[15]  The EMILLE Project (Enabling Minority Language Engineering).
http://www.emille.lancs.ac.uk/
[16]  Hussain, S. (2008) Resources for Urdu Language Processing. Proceedings of the 6th Workshop on Asian Language Resources, Hyderabad, 11-12 January 2008, 99-100.
[17]  http://corpora.uni_leipzig.org
[18]  www.cfilt.iitb.ac.in/~hdict/webinterface_user/index.php
[19]  Arora, K., Arora, S., Verma, K. and Agrawal, S.S. Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages. Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP), Jeju Island, 4-8 October 2004, 2885-2888.
[20]  Syntax and Morphology in Hindi and Urdu: A Lexical Resource.
https://clas.uiowa.edu/linguistics/hindi-verb-project
[21]  Shabdkosh.
http://www.shabdkosh.com/
[22]  http://www.shabdkosh.com/content/category/downloads/
[23]  Das, A and Bandyopadhyay, S. (2010) SentiWordNet for Indian Languages. Proceedings of the 8th Workshop on Asian Language Resources (ALR), Beijing, 21-22 August 2010, 1-8.
[24]  Hindi Wordnet.
http://www.cfilt.iitb.ac.in/wordnet/webhwn/API_downloaderInfo.php
[25]  Jha, S., Narayan, D., Pande, P. and Bhattacharyya, P.A. (2001) WordNet for Hindi. Proceedings of the International Workshop on Lexical Resources in Natural Language Processing, Hyderabad, January 2001.
[26]  Joshi, A., Balamurali, A.R. and Bhattacharyya, P. (2010) A Fall-Back Strategy for Sentiment Analysis in Hindi: A Case Study. Proceedings of the Fifth International Conference on Systems (ICONS), Menuires, 11-16 April 2010, 1-6.

Full-Text

comments powered by Disqus