全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于音素解码的语种识别系统联合自适应算法研究

DOI: 10.3724/SP.J.1004.2012.00652, PP. 652-658

Keywords: 语种识别,音素识别器后接向量空间模型,联合自适应,受约束的最大似然线性回归,支持向量机自适应

Full-Text   Cite this paper   Add to My Lib

Abstract:

?针对真实环境下的语种识别,信道类型和通话内容等非语种方面因素的不同都会造成测试和训练条件的不匹配,从而影响系统的识别性能.本文以音素识别器后接向量空间模型(Phonerecognizerfollowedbyvectorspacemodel,PRVSM)为语种识别系统,引入联合自适应算法来解决系统中测试和训练条件的失配问题.研究了三种自适应方法用于系统的不同阶段:1)基于受约束的最大似然线性回归(Constrainedmaximumlikelihoodlinearregression,CMLLR)的声学模型自适应;2)基于全局N元文法的音位特征向量自适应;3)VSM模型中的支持向量机(Supportvectormachines,SVM)自适应.在综合采用多种自适应技术后,PRVSM系统的性能有了较大的提高,在NISTLRE2009测试库上对于30s、10s和3s的测试段,基于不同音素识别器的PRVSM系统的等错误率(Equalerrorrate,EER)分别相对降低了18%~23%、12%~20%以及5%~9%.

References

[1]  Zissman M A. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 1996, 4(1): 31-44
[2]  Gauvain J L, Messaoudi A, Schwenk H. Language recognition using phone lattices. In: Proceedings of the 8th International Conference on Spoken Language Processing. Jeju Island, Korea: ISCA, 2004. 1283-1286
[3]  Huang X D, Lee K F. On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. IEEE Transactions on Speech and Audio Processing, 1993, 1(2): 150-157
[4]  BenZeghiba M F, Gauvain J L, Lamel L. Context-dependent phone models and models adaptation for phonotactic language recognition. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association. Brisbane, Australia: ISCA, 2008. 313-316
[5]  Richardson F S, Campbell W M. Language recognition with discriminative keyword selection. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008. 4145-4148
[6]  Tong R, Ma B, Li H, Chng E S. Target-oriented phone tokenizers for spoken language recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, USA: IEEE, 2008. 4221-4224
[7]  Leung C C, Ma B, Li H. Parallel acoustic model adaptation for improving phonotactic language recognition. In: Proceedings of the Speaker and Language Recognition Workshop. Brno, Czech Republic: ISCA, 2010. 246-250
[8]  Yang J, Yan R, Hauptmann A G. Adapting SVM classifiers to data with shifted distributions. In: Proceedings of the 7th IEEE International Conference on Data Mining Workshops. Omaha, USA: IEEE, 2007. 69-76
[9]  Zhang W Q, He L, Deng Y, Liu J, Johnson M T. Time-frequency cepstral features and heteroscedastic linear discriminant analysis for language recognition. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(2): 266-276
[10]  Li H, Ma B, Lee C H. A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(1): 271-284
[11]  Lippmann R P, Carlson B A. Speech recognition by humans and machines under conditions with severe channel variability and noise. In: Proceedings of the Applications and Sciences of Artificial Neural Networks. Orlando, USA: SPIE, 1997. 46-57
[12]  Shen W, Reynolds D. Improved phonotactic language recognition with acoustic adaptation. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association. Antwerp, Belgium: ISCA, 2007. 358-361
[13]  Xu B, Song Y, Dai L. The adaptation schemes in PR-SVM based language recognition. In: Proceedings of the 6th International Symposium on Chinese Spoken Language Processing. Kunming, China: IEEE, 2008. 1-4
[14]  Matejka P, Schwarz P, Cernocky J, Chytil P. Phonotactic language identification using high quality phoneme recognition. In: Proceedings of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal: ISCA, 2005. 2237-2240
[15]  Campbell W M, Campbell J P, Reynolds D A, Jones D A, Leek T R. Phonetic speaker recognition with support vector machines. In: Proceedings of the Advances in Neural Information Processing System. Vancouver, Canada: MIT Press, 2003. 1377-1384
[16]  Digalakis V V, Rtischev D, Neumeyer L G. Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Transactions on Speech and Audio Processing, 1995, 3(5): 357-366
[17]  National Institute of Standards and Technology. The 2009 NIST Language Recognition Evaluation Plan (LRE09) [Online], available: http://www.itl.nist.gov/iad/mig/tests/lre/2009/LRE09_EvalPlan_v6.pdf, December 4, 2011

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133