OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2012

基于正则化的本征音说话人自适应方法

DOI: 10.3724/SP.J.1004.2012.01950, PP. 1950-1957

张文林, 张连海, 牛铜, 屈丹, 李弼程

Keywords: 语音识别,说话人自适应,本征音,正则化,弹性网

Full-Text Cite this paper Add to My Lib

Abstract:

？将正则化方法应用于本征音说话人自适应算法中,有效地解决了说话人子空间基的先验选择问题.通过对似然函数引入适当的正则项,在优化过程中从候选本征音基矢量中自动选择最佳的本征音进行线性组合.本文讨论了三种正则化因子,并给出了其数学优化算法.l1正则化可以得到说话人因子的稀疏解,其非零项即对应最佳本征音基矢量;l2正则化可以提高解的稳健性,在某种程度上减少了子空间维数的先验选择对识别率的影响;而弹性网正则化则通过线性组合在二者之间取得折衷.有监督说话人自适应实验表明,新方法与本征音方法的最好结果相比,在少量的自适应数据条件下(10s以下),识别率相对提高了近1%～2%.三种方法中,l1正则化略优于l2正则化,而在引入弹性网正则化后,系统性能有了进一步提高.

References

[1]	Li Hu-Sheng, Liu Jia, Liu Run-Sheng. Technology of speaker adaptation in speech recognition and its development trend. Acta Electronica Sinica, 2003, 31(1): 103-108(李虎生, 刘加, 刘润生. 语音识别说话人自适应研究现状及发展趋势. 电子学报, 2003, 31(1): 103-108)
[2]	Jeong Y, Kim H S. New speaker adaptation method using 2-d PCA. IEEE Signal Processing Letters, 2010, 17(2): 193- 196
[3]	Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B: Methodological, 1996, 58(1): 267-288
[4]	Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 2005, 67(2): 301-320
[5]	Sivaram G S V S, Nemala S K, Elhilali M, Tran T D, Hermansky H. Sparse coding for speech recognition. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, USA: IEEE, 2010. 4346-4349
[6]	Omar M K. Regularized feature-based maximum likelihood linear regression for speech recognition. In: Proceedings of the 2007 Interspeech. Antwerp, Belgium: ISCA, 2007. 1561 -1564
[7]	Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X A, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P. The HTK Book (for HTK Version 3.4) [Online], available: http://htk.eng.cam.ac.uk/, January 1, 2009
[8]	Teng W X, Gravier G, Bimbot F, Soufflet F. Speaker adaptation by variable reference model subspace and application to large vocabulary speech recognition. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, China: IEEE, 2009. 4381-4384
[9]	Daubechies I, Defriese M, De Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 2004, 57(11): 1413-1457
[10]	Figueiredo M A T, Nowak R D, Wright S J. Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4): 586-597
[11]	Kuhn R, Junqua J C, Nguyen P, Niedzielski N. Rapid speaker adaptation in eigenvoice space. IEEE Transactions on Speech and Audio Processing, 2000, 8(6): 695-707
[12]	Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 2005, 13(3): 345-354
[13]	Jong Y. Speaker adaptation based on the multilinear decomposition of training speaker models. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, USA: IEEE, 2010. 4870-4873
[14]	Zibulevsky M, Elad M. l1-l2 optimization in signal and image processing. IEEE Signal Processing Magazine, 2010, 27(3): 76-88
[15]	Sainath T N, Carmi A, Kanevsky D, Ramabhadran B. Bayesian compressive sensing for phonetic classification. In: Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, Texas, USA: IEEE, 2010. 4370-4373
[16]	Sainath T N, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A. Sparse representation features for speech recognition. In: Proceedings of the 2010 Interspeech. Makuhari, Japan: ISCA, 2010. 2254-2257
[17]	Lu L, Ghoshal A, Renals S. Regularized subspace Gaussian mixture models for speech recognition. IEEE Signal Processing Letters, 2011, 18(7): 419-422
[18]	Olsen P A, Huang J, Goel V, Rennie S J. Sparse maximum a posteriori adaptation. In: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, USA: IEEE, 2011. 53-58
[19]	Teng W X, Gravier G, Bimbot F, Soufflet F. Rapid speaker adaptation by reference model interpolation. In: Proceedings of the 2007 Interspeech. Antwerp, Belgium: ISCA, 2007. 258-261
[20]	Hastie T, Tibshirani R, Friedma J H. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Berlin: Springer-Verlag, 2005.
[21]	Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. The Annals of Statistics, 2004, 32(2): 407-499
[22]	Chang E, Shi Y, Zhou J L, Huang C. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research. In: Proceedings of the 2001 European Conference on Speech Communication and Technology. Scandinavia, Germany: ISCA, 2001. 2799-2782

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133