全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于层次稀疏DBN的瓶颈特征提取方法*

DOI: 10.16451/j.cnki.issn1003-6059.201502010, PP. 173-180

Keywords: 音素识别,深度可信神经网络(DBN),重叠组套索,层次结构

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对现有语音特征无法有效利用长时段语音和监督性类别信息,及现有瓶颈特征提取方法耗时过长等缺陷,提出基于层次结构稀疏深度可信神经网络的瓶颈特征提取方法.该方法将重叠组套索作为深度可信神经网络目标函数的稀疏正则项使用,从而构建训练速度更快的稀疏深度可信神经网络.然后利用层次结构的网络架构方式,将两个稀疏深度可信神经网络串联后使用,进一步增强瓶颈特征的判决能力.文中将此瓶颈特征应用于音素识别中,实验表明该特征的有效性.

References

[1]  Han J Q, Zhang L, Zheng T R. Speech Signal Processing. Beijing, China: Tsinghua University Press, 2005 (in Chinese)(韩纪庆,张 磊,郑铁然.语音信号处理.北京:清华大学出版社,2005)
[2]  Schwarz P. Phoneme Recognition Based on Long Temporal Context[EB/OL]. [2013-07-10]. http://speech.fit.vutbr.cz/software/phoneme-recognizer-based-long-temporal-context
[3]  Jansen A, Niyogi P. Point Process Models for Spotting Keywords in Continuous Speech. IEEE Trans on Audio, Speech, and Language Processing, 2009, 17(8): 1457-1470
[4]  Matějka P, Schwarz P, Cˇernock J, et al. Phonotactic Language Identification Using High Quality Phoneme Recognition // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 2237-2240
[5]  Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313(5786): 504-507
[6]  Deng L. An Overview of Deep-Structured Learning for Information Processing // Proc of the Asian-Pacific Signal and Information Processing Association Annual Summit and Conference. Xi′an, China, 2011: 1-14
[7]  Sivaram G S V S, Hermansky H. Sparse Multilayer Perceptron for Phoneme Recognition. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 23-29
[8]  Yu D, Seide F, Li G, et al. Exploiting Sparseness in Deep Neural Networks for Large Vocabulary Speech Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Kyoto, Japan, 2012: 4409-4412
[9]  Luo H. Restricted Boltzmann Machines: A Collaborative Filtering Perspective. Ph.D Dissertation. Shanghai, China: Shanghai Jiao Tong University, 2011(in Chinese)(罗 恒.基于协同过滤视角的受限玻尔兹曼机研究.博士学位论文.上海:上海交通大学, 2011)
[10]  Mohamed A, Dahl G E, Hinton G. Acoustic Modeling Using Deep Belief Networks. IEEE Trans on Audio, Speech, and Language Processing, 2012, 20(1): 14-22
[11]  Siniscalchi S M, Yu D, Deng L, et al. Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model. IEEE Signal Processing Letters, 2013, 20(3): 201-204
[12]  Bergstra J, Breuleux O, Bastien F, et al. Theano: A CPU and GPU Math Compiler in Python [EB/OL]. [2013-07-01]. http://www.iro.umontreal.ca/~lisa/pointeurs/theano-scipy2010.pdf
[13]  Yu D, Seltzer M. Improved Bottleneck Features Using Pretrained Deep Neural Networks // Proc of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy, 2011: 237-240
[14]  Grézl F, Karafiát M, Kontár S, et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, USA, 2007, IV: 757-760
[15]  Mohamed A, Sainath T N, Dahl G, et al. Deep Belief Networks Using Discriminative Features for Phone Recognition // Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic, 2011: 5060-5063

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133