全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于受限玻尔兹曼机的频谱建模与单元挑选语音合成方法*

DOI: 10.16451/j.cnki.issn1003-6059.201508001, PP. 673-679

Keywords: 语音合成,单元挑选,隐马尔可夫模型,受限玻尔兹曼机

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出基于受限玻尔兹曼机的频谱建模与单元挑选语音合成方法.在模型训练阶段,采用受限玻尔兹曼机对包含丰富细节的频谱特征建模,如谱包络、短时幅度谱,取代传统的使用对角方差单高斯模型和梅尔倒谱特征的频谱建模方法,改善声学模型对于频谱特征的描述能力.在语音合成阶段,使用训练得到的受限玻尔兹曼机模型计算备选样本频谱特征的对数似然值,并通过分段线性映射构建单元挑选的目标代价函数.实验表明文中方法可有效提高合成语音的自然度.

References

[1]  Mizutani T, Kagoshima T. Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method. IEICE Trans on Information and Systems, 2005, 88(11): 2565-2572
[2]  Gros J Z, Zganec M. An Efficient Unit-Selection Method for Conca-tenative Text-to-Speech Synthesis Systems. Journal of Computing and Information Technology, 2008, 16(1): 69-78
[3]  Ling Z H, Wang R H. HMM-Based Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion // Proc of the International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA, 2007, IV: 1245-1248
[4]  Wang R H, Dai L R, Ling Z H, et al. Trainable Unit Selection Speech Synthesis under Statistical Framework. Chinese Science Bu-lletin, 2009, 54(8): 1133-1138 (in Chinese)(王仁华,戴礼荣,凌震华,等.基于统计建模的可训练单元挑选语音合成方法.科学通报, 2009, 54(8): 1133-1138)
[5]  Ling Z H, Wang R H. Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis. Pattern Recognition and Artificial Intelligence, 2008, 21(3): 280-284 (in Chinese)(凌震华,王仁华.基于统计声学模型的单元挑选语音合成算法.模式识别与人工智能, 2008, 21(3): 280-284)
[6]  Ling Z H, Lu H, Hu G P, et al. The USTC System for Blizzard Challenge 2008[EB/OL]. [2014-04-01]. http://www.festvox.org/blizzard/bc2008/ustc_Blizzard2008.pdf
[7]  Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks. Science, 2006, 313(5786): 504-507
[8]  Ling Z H, Li D, Yu D. Modeling Spectral Envelopes Using Restricted Boltzmann Machines and Deep Belief Networks for Statistical Parametric Speech Synthesis. IEEE Trans on Audio, Speech, and Language Processing, 2013, 21(10): 2129-2139
[9]  Kawahara H, Masuda-Katsuse I, de Cheveigné A. Restructuring Speech Representations Using a Pitch-Adaptive Time-Frequency Smoothing and an Instantaneous-Frequency-Based F0 Extraction: Possible Role of a Repetitive Structure in Sounds. Speech Communication, 1999, 27(3/4): 187-207
[10]  Tokuda K, Masuko T, Miyakazi N, et al. Multi-space Probability Distribution HMM. IEICE Trans on Information and Systems, 2002, E85-D(3): 455-464
[11]  Ling Z H, Wang Z G, Dai L R, et al. Statistical Modeling of Syllable-Level F0 Features for HMM-Based Unit Selection Speech Synthesis // Proc of the 7th International Symposium on Chinese Spoken Language Processing. Tainan, China, 2010: 144-147
[12]  Salakhutdinov R. Learning Deep Generative Models. Ph.D Dissertation. Toronto, Canada: University of Toronto, 2009
[13]  Hinton G E. Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 2002, 14(8): 1771-1800

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133