全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于深层神经网络的藏语识别*

DOI: 10.16451/j.cnki.issn1003-6059.201503003, PP. 209-213

Keywords: 藏语,连续语音识别,数据驱动,深层神经网络(DNN)

Full-Text   Cite this paper   Add to My Lib

Abstract:

文中首次涉及藏语的自然对话风格大词汇电话连续语音识别问题.作为一种少数民族语言,藏语识别面临的最大的困难是数据稀疏问题.文中在基于深层神经网络(DNN)的声学模型建模中,针对数据稀疏的问题,提出采用大语种数据训练好的DNN作为目标模型的初始网络进行模型优化的策略.另外,由于藏语语音学的研究很不完善,人工生成决策树问题集的方式并不可行.针对该问题,文中利用数据驱动的方式自动生成决策树问题集,对三音子隐马尔可夫模型(HMM)进行状态绑定,从而减少需要估计的模型参数.在测试集上,基于混合高斯模型(GMM)声学建模的藏字识别率为30.86%.在基于DNN的声学模型建模中,采用三种大语种数据训练好的DNN网络作为初始网络,并在测试集上验证该方法的有效性,藏字识别正确率达到43.26%.

References

[1]  Yao X, Li Y H, Shan G R, et al. Research on Tibetan Isolated-word Speech Recognition System. Journal of Northwest University for Nationalities: Natural Science, 2009, 30(1): 29-36,50 (in Chinese) (姚 徐,李永宏,单广荣,等.藏语孤立词语音识别系统研究.西北民族大学学报:自然科学版, 2009, 30(1): 29-36,50)
[2]  Han Q H, Yu H Z. Research on Speech Recognition for Ando Tibetan Based on HMM. Software Guide, 2010, 9(7): 173-175 (in Chinese) (韩清华,于洪志.基于HMM的安多藏语非特定人孤立词语音识别研究. 软件导刊, 2010, 9(7): 173-175)
[3]  Li G Y, Meng M. Research on Acoustic Model of Large-Vocabulary Continuous Speech Recognition for Lhasa Tibetan. Computer Engineering, 2012, 38(5): 189-191(in Chinese) (李冠宇,孟 猛.藏语拉萨话大词表连续语音识别声学模型研究.计算机工程, 2012, 38(5): 189-191)
[4]  Dahl G E, Yu D, Deng L, et al. Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 30-42
[5]  Hinton G E, Osindero S, Teh Y W. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 2006, 18(7): 1527-1554
[6]  Beulen K, Ney H. Automatic Question Generation for Decision Tree Based State Tying // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, 1998, II: 805-805
[7]  Singh R, Raj B, Stern R M. Automatic Clustering and Generation of Contextual Questions for Tied States in Hidden Markov Models // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Phoenix, USA, 1999, I: 117-120
[8]  Huang J T, Li J Y, Yu D, et al. Cross-Language Knowledge Transfer Using Multilingual Deep Neural Network with Shared Hidden Layers // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 7304-7308
[9]  Carreira-Perpinan M A, Hinton G E. On Contrastive Divergence Learning. [EB/OL]. [2013-02-15]. www.docin.com/p-33657so63.html
[10]  Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 14-22
[11]  Erhan D, Bengio Y, Courville A, et al. Why Does Unsupervised Pre-training Help Deep Learning? Journal of Machine Learning Research. 2010, 11: 625-660
[12]  Deng L, Seltzer M, Yu D, et al. Binary Coding of Speech Spectrograms Using a Deep Auto-Encoder // Proc of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1692-1695

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133