OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2015

基于深层神经网络的藏语识别*

DOI: 10.16451/j.cnki.issn1003-6059.201503003, PP. 209-213

袁胜龙,郭武,戴礼荣

Keywords: 藏语,连续语音识别,数据驱动,深层神经网络(DNN)

Full-Text Cite this paper Add to My Lib

Abstract:

文中首次涉及藏语的自然对话风格大词汇电话连续语音识别问题.作为一种少数民族语言，藏语识别面临的最大的困难是数据稀疏问题.文中在基于深层神经网络(DNN)的声学模型建模中，针对数据稀疏的问题，提出采用大语种数据训练好的DNN作为目标模型的初始网络进行模型优化的策略.另外，由于藏语语音学的研究很不完善，人工生成决策树问题集的方式并不可行.针对该问题，文中利用数据驱动的方式自动生成决策树问题集，对三音子隐马尔可夫模型(HMM)进行状态绑定，从而减少需要估计的模型参数.在测试集上，基于混合高斯模型(GMM)声学建模的藏字识别率为30.86%.在基于DNN的声学模型建模中，采用三种大语种数据训练好的DNN网络作为初始网络，并在测试集上验证该方法的有效性，藏字识别正确率达到43.26%.

References

[1]	Yao X, Li Y H, Shan G R, et al. Research on Tibetan Isolated-word Speech Recognition System. Journal of Northwest University for Nationalities: Natural Science, 2009, 30(1): 29-36,50 (in Chinese) (姚徐,李永宏,单广荣,等.藏语孤立词语音识别系统研究.西北民族大学学报:自然科学版, 2009, 30(1): 29-36,50)
[2]	Han Q H, Yu H Z. Research on Speech Recognition for Ando Tibetan Based on HMM. Software Guide, 2010, 9(7): 173-175 (in Chinese) (韩清华,于洪志.基于HMM的安多藏语非特定人孤立词语音识别研究. 软件导刊, 2010, 9(7): 173-175)
[3]	Li G Y, Meng M. Research on Acoustic Model of Large-Vocabulary Continuous Speech Recognition for Lhasa Tibetan. Computer Engineering, 2012, 38(5): 189-191(in Chinese) (李冠宇,孟猛.藏语拉萨话大词表连续语音识别声学模型研究.计算机工程, 2012, 38(5): 189-191)
[4]	Dahl G E, Yu D, Deng L, et al. Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 30-42
[5]	Hinton G E, Osindero S, Teh Y W. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 2006, 18(7): 1527-1554
[6]	Beulen K, Ney H. Automatic Question Generation for Decision Tree Based State Tying // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, USA, 1998, II: 805-805
[7]	Singh R, Raj B, Stern R M. Automatic Clustering and Generation of Contextual Questions for Tied States in Hidden Markov Models // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Phoenix, USA, 1999, I: 117-120
[8]	Huang J T, Li J Y, Yu D, et al. Cross-Language Knowledge Transfer Using Multilingual Deep Neural Network with Shared Hidden Layers // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 7304-7308
[9]	Carreira-Perpinan M A, Hinton G E. On Contrastive Divergence Learning. [EB/OL]. [2013-02-15]. www.docin.com/p-33657so63.html
[10]	Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 14-22
[11]	Erhan D, Bengio Y, Courville A, et al. Why Does Unsupervised Pre-training Help Deep Learning? Journal of Machine Learning Research. 2010, 11: 625-660
[12]	Deng L, Seltzer M, Yu D, et al. Binary Coding of Speech Spectrograms Using a Deep Auto-Encoder // Proc of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1692-1695

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133