全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

深度神经网络自适应中基于身份认证向量的归一化方法
Investigation of normalization methods in speaker adaptation of deep neural network using i-vector

DOI: 10.7523/j.issn.2095-6134.2017.05.014

Keywords: 身份认证向量,深度神经网络,说话人自适应,归一化
identity vector
,deep neural network,speaker adaptation,normalization

Full-Text   Cite this paper   Add to My Lib

Abstract:

摘要 深度神经网络是近年来非常流行的一种语音识别声学建模技术,其性能比之前主流的高斯混合模型有显著提高,但是深度神经网络的说话人自适应技术一直没有很好地解决。利用身份认证向量对深度神经网络进行自适应,并研究身份认证向量归一化对系统的影响,提出一种新的max-min线性归一化技术。实验表明在TIMIT数据集上该技术可使字错误率比传统方法相对下降5.10%。

References

[1]  Sailor H B, Patil H A. Filterbank learning using Convolutional Restricted Boltzmann Machine for speech recognition//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5895-5899.
[2]  李虎生, 刘加, 刘润生. 语音识别说话人自适应研究现状及发展趋势[J]. 电子学报, 2003, 31(1):103-108.
[3]  Hinton G E, Osindero S, Toh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527.
[4]  孙志军, 薛磊, 许阳明, 等. 深度学习研究综述[J]. 计算机应用研究, 2012, 29(8):2806-2810.
[5]  Abdel-Hamid O, Jiang H. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code//Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013:7942-7946.
[6]  Seide F, Li G, Chen X, et al. Feature engineering in context-dependent deep neural networks for conversational speech transcription//Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011:24-29.
[7]  Gemello R, Mana F, Scanzio S, et al. Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training//Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. IEEE, 2006:1189-1192.
[8]  Gupta V, Kenny P, Ouellet P, et al. i-Vector-based speaker adaptation of deep neural networks for french broadcast audio transcription//Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014:6334-6338.
[9]  栗志意.i-vector说话人识别建模关键技术与实用化研究.北京:清华大学,2014.
[10]  Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2011, 19(4):788-798.
[11]  Torbati A H H N, Picone J. A doubly hierarchical Dirichlet process hidden Markov model with a non-ergodic structure[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1):174-184.
[12]  Chen J, Wu L, Audhkhasi K, et al. Efficient one-vs-one kernel ridge regression for speech recognition//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:2454-2458.
[13]  Ghahremani P, Droppo J, Seltzer M L. Linearly augmented deep neural network//2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5085-5089.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133