OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2015

基于声学特征空间非线性流形结构的语音识别声学模型

DOI: 10.16383/j.aas.2015.c140399, PP. 1024-1033

张文林, 牛铜, 屈丹, 李弼程, 裴喜龙

Keywords: 语音识别,声学模型,非线性流形,混合因子分析

Full-Text Cite this paper Add to My Lib

Abstract:

？从语音信号声学特征空间的非线性流形结构特点出发,利用流形上的压缩感知原理,构建新的语音识别声学模型.将特征空间划分为多个局部区域,对每个局部区域用一个低维的因子分析模型进行近似,从而得到混合因子分析模型.将上下文相关状态的观测矢量限定在该非线性低维流形结构上,推导得到其观测概率模型.最终,每个状态由一个服从稀疏约束的权重矢量和若干个服从标准正态分布的低维局部因子矢量所决定.文中给出了局部区域潜在维数的确定准则及模型参数的迭代估计算法.基于RM语料库的连续语音识别实验表明,相比于传统的高斯混合模型(Gaussianmixturemodel,GMM)和子空间高斯混合模型(SubspaceGaussianmixturemodel,SGMM),新声学模型在测试集上的平均词错误率(Worderrorrate,WER)分别相对下降了33.1%和9.2%.

References

[1]	Olsen P A, Gopinath R A. Modeling inverse covariance matrices by basis expansion. IEEE Transactions on Speech and Audio Processing, 2004, 12(1): 37-46
[2]	Ko T, Mak B. Eigentriphones for context-dependent acoustic modeling. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(6): 1285-1294
[3]	Ko T, Mak B. Eigentrigraphemes for under-resourced languages. Speech Communication, 2014, 56: 132-141
[4]	Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose R C, Schwarz P, Thomas S. The subspace Gaussian mixture model —— a structured model for speech recognition. Computer Speech & Language, 2011, 25(2): 404-439
[5]	Qi J, Wang D, Tejedor J. Subspace models for bottleneck features. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 1746-1750
[6]	Motlícek P, Imseng D, Garner P N. Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 510-514
[7]	Lu L, Ghoshal A, Renals S. Cross-lingual subspace Gaussian mixture models for low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(1): 17-27
[8]	Saon G, Chien J T. Bayesian sensing hidden Markov models. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 43-54
[9]	Zhang W B, Fung P. Sparse inverse covariance matrices for low resource speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 659-668
[10]	Zhang W B, Fung P. Discriminatively trained sparse inverse covariance matrices for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(5): 873-882
[11]	Jansen A, Niyogi P. Intrinsic Fourier analysis on the manifold of speech sounds. In: Proceedings of the 2006 International Conference on Acoustics, Speech, and Signal Processing. Toulouse: IEEE, 2006. 1: 241-244
[12]	Lu X G, Dang J W. Vowel production manifold: intrinsic factor analysis of vowel articulation. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 1053-1062
[13]	Ghahramani Z, Hinton G. The EM Algorithm for Mixtures of Factor Analyzers, Technical Report CRG-TR-96-1, Department of Computer Science, University of Toronto, Toronto, Canada, 1996.
[14]	Carin L, Baraniuk R G, Cevher V, Dunson D, Jordan M I, Sapiro G, Wakin M B. Learning low-dimensional signal models. IEEE Signal Processing Magazine, 2011, 28(2): 39-51
[15]	Chen M H, Silva J, Paisley J, Wang C P, Dunson D, Carin L. Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds. IEEE Transactions on Signal Processing, 2010, 58(12): 6140-6155
[16]	Bishop C M. Pattern Recognition and Machine Learning. New York: Springer Science+Business Media, 2006. 90-93
[17]	Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y M, Schwarz P, Silovsky J, Stemmer G, Vesely K. The Kaldi speech recognition toolkit. In: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, US: IEEE, 2011.
[18]	Zibulevsky M, Elad M. L1-L2 optimization in signal and image processing. IEEE Signal Processing Magazine, 2010, 27(3): 76-88
[19]	Riedhammer K, Bocklet T, Ghoshal A, Povey D. Revisiting semi-continuous hidden Markov models. In: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto: IEEE, 2012. 4721-4724

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133