|
- 2017
深度神经网络自适应中基于身份认证向量的归一化方法
|
Abstract:
摘要 深度神经网络是近年来非常流行的一种语音识别声学建模技术,其性能比之前主流的高斯混合模型有显著提高,但是深度神经网络的说话人自适应技术一直没有很好地解决。利用身份认证向量对深度神经网络进行自适应,并研究身份认证向量归一化对系统的影响,提出一种新的max-min线性归一化技术。实验表明在TIMIT数据集上该技术可使字错误率比传统方法相对下降5.10%。
[1] | Sailor H B, Patil H A. Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5895-5899. |
[2] | 李虎生, 刘加, 刘润生. 语音识别说话人自适应研究现状及发展趋势[J]. 电子学报, 2003, 31(1):103-108. |
[3] | Hinton G E, Osindero S, Toh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527. |
[4] | 孙志军, 薛磊, 许阳明, 等. 深度学习研究综述[J]. 计算机应用研究, 2012, 29(8):2806-2810. |
[5] | Abdel-Hamid O, Jiang H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code//Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013:7942-7946. |
[6] | Seide F, Li G, Chen X, et al. Feature engineering in context-dependent deep neural networks for conversational speech transcription//Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011:24-29. |
[7] | Gemello R, Mana F, Scanzio S, et al. Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006:1189-1192. |
[8] | Gupta V, Kenny P, Ouellet P, et al. i-Vector-based speaker adaptation of deep neural networks for french broadcast audio transcription//Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014:6334-6338. |
[9] | 栗志意.i-vector说话人识别建模关键技术与实用化研究.北京:清华大学,2014. |
[10] | Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2011, 19(4):788-798. |
[11] | Torbati A H H N, Picone J. A doubly hierarchical Dirichlet process hidden Markov model with a non-ergodic structure[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1):174-184. |
[12] | Chen J, Wu L, Audhkhasi K, et al. Efficient one-vs-one kernel ridge regression for speech recognition//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:2454-2458. |
[13] | Ghahremani P, Droppo J, Seltzer M L. Linearly augmented deep neural network//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5085-5089. |