|
- 2015
低数据资源条件下基于Bottleneck特征与SGMM模型的语音识别系统
|
Abstract:
摘要 语音识别系统需要大量有标注训练数据,在低数据资源条件下的识别性能往往不理想.针对数据匮乏问题,本文先研究子空间高斯混合声学模型通过参数共享减少待估计的参数规模,并使用基于最大互信息准则的区分型训练技术提高识别精度;而后在特征层面应用基于深度神经网络的Bottleneck特征来达到特征提取和降维的目的;最后将上述研究成果结合并构建了低资源条件下的语音识别系统.在国际标准的OpenKWS 2013数据库上的实验结果表明,本文的技术能够有效改善低资源条件下的系统识别性能,相比基线系统有12%左右的词错误率降低.
[1] | <p> Cui X, Xue J, Dognin P L, et al. Acoustic modeling with bootstrap and restructuring for low-resourced languages[C]//Interspeech. 2010: 2 974-2 977. |
[2] | Davis S, Mermelstein P. Comparison of parametric representations formonosyllable word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, 28(4):357-366. |
[3] | 单煜翔. 高效大词汇量连续语音识别解码算法研究与工程化实现[D]. 北京: 清华大学, 2012. |
[4] | Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc ASRU. 2011: 1-4. |
[5] | Rabiner L R. A Tutorial on hidden markov models and selected applications in speech recognition[J].Proceedings of IEEE, 1989, 77(2):257-286. |
[6] | Povey D, Burget L, Agarwal M, et al. The subspace Gaussian mixture model: a structured model for speech recognition[J]. Computer Speech and Language, 2011, 25(2):404-439. |
[7] | Dahl G, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition [J]. IEEE Trans on Audio, Speech and Language Processing, 2012, 20(1): 30-42. |
[8] | Vu N T, Schlippe T, Kraus F, et al. Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit[C]//Interspeech. 2010: 865-868. |
[9] | Povey D, Burget L, Agarwal M, et al. Subspace Gaussian mixture models for speech recognition[C]//Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010: 4 330-4 333. |
[10] | Seide F, Li G, Yu D. Conversational speech transcription using context-dependent deep neural networks//Interspeech. 2011: 437-440. |
[11] | Normandin Y. Hidden Markov models, maximum mutual information estimation, and the speech recognition problem . Canada: McGill University, 1991. |
[12] | He X D, Deng L, Chou W. Discriminative learning in sequential pattern recognition[J]. IEEE Signal Processing Magazine, 2008, 14(1):14-36. |
[13] | Yu D, Seltzer M L. Improved bottleneck features using pretrained deep neural networks[C]//INTERSPEECH. 2011: 237-240. |
[14] | IARPA. OpenKWS13 keyword search evaluation . (2013-01-25) . http://www.nist.gov/itl/iad/mig/upload/OpenKWS13. |
[15] | 钱彦旻. 低数据资源条件下的语音识别技术新方法研究[D]. 北京: 清华大学, 2013. |
[16] | Fontaine V, Ris C, Boite J M. Nonlinear discriminant analysis for improved speech recognition[C]//Eurospeech. 1997. |
[17] | Grézl F, Karafiát M, Kontár S, et al. Probabilistic and bottle-neck features for LVCSR of meetings[C]//Proc ICASSP. 2007(4):757-761.</p> |
[18] | Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE, Signal Processing Magazine, 2012, 29(6): 82-97. |
[19] | Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1 527-1 554. |
[20] | Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. |