OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2010

基于感知加权线谱对距离的最小生成误差语音合成模型训练方法

, PP. 572-579

雷鸣,凌震华,戴礼荣

Keywords: 语音合成,隐可尔科夫模型(HMM),最小生成误差(MGE),感知加权,线谱对参数

Full-Text Cite this paper Add to My Lib

Abstract:

提出一种基于感知加权线谱对(LineSpectralPair，LSP)距离的最小生成误差(MinimumGenerationError，MGE)模型训练方法，用以改善基于隐马尔科夫模型的参数语音合成系统性能。在采用线谱对参数表征语音频谱特征时，传统MGE训练中使用的欧氏距离生成误差计算方法并不能较好地反映生成频谱与自然频谱之间的真实距离，而采用与谱参数无关的对数谱间距(LogSpectralDistortion，LSD)定义的生成误差函数可改善这一问题，但改进后主观效果不明显，且运算复杂度很高。文中先提出基于加权LSP距离的MGE模型训练方法，并在实验中从主客观对比不同加权方法以及基于LSD的MGE训练。最后，找到一种感知加权方法，不但具有较好的主观表现，而且在运算复杂度上与传统MGE训练相比几乎没有增加。

References

[1]	Masuko T, Tokuda K, Kobayashi T, et al. Speech Synthesis Using HMMs with Dynamic Features // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Atlanta, USA, 1996, Ⅰ: 389-392
[2]	Yoshimura T, Tokuda K, Masuko T, et al. Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Phoenix, USA, 1999, Ⅴ: 2347-2350
[3]	Tokuda K, Kobayashi T, Imai S. Speech Parameter Generation from HMM Using Dynamic Features // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Detroit, USA, 1995, Ⅰ: 660-663
[4]	Ling Zhenhua, Qin Long, Lu Heng, et al. The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007 // Proc of the Blizzard Challenge Workshop. Bonn, Germany, 2007: 17-21
[5]	Zen H, Toda T. An Overview of Nitech HMM-Based Speech Synthesis System for Blizzard Challenge 2005 // Proc of the 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005: 93-96
[6]	Wu Yijian, Wang Renhua. Minimum Generation Error Training for HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France, 2006, Ⅰ: 889-892
[7]	Wu Yijian, Guo Wu, Wang Renhua. Minimum Generation Error Criterion for Tree-Based Clustering of Context Dependent HMMs // Proc of the 9th International Conference on Speken Language Processing. Pittsburgh, USA, 2006: 2046-2049
[8]	Qin Long, Wu Yijian, Ling Zhenhua, et al. Minimum Generation Error Linear Regression Based Model Adaptation for HMM-Based Speech Synthesis // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA, 2008: 3953-3956
[9]	McLoughlin I V. Line Spectral Pairs. Signal Processing Journal, 2008, 88(3): 448-467
[10]	Wu Yijian, Wang Renhua. HMM-Based Trainable Speech Synthesis for Chinese. Journal of Chinese Information Processing, 2006, 20(4): 75-81 (in Chinese)(吴义坚,王仁华.基于HMM的可训练中文语音合成.中文信息学报, 2006, 20(4): 75-81)
[11]	Wu Yijian, Tokuda K. Minimum Generation Error Training with Direct Log Spectral Distortion on LSPs for HMM-Based Speech Synthesis // Proc of the 9th Annual Conference of the International Speech Communication Association. Brisbane, Australia, 2008: 577-580
[12]	Lee M S, Kim H K, Lee H S. A New Distortion Measure for Spectral Quantization Based on the LSF Intermodel Interlacing Property. Speech Communication, 2001, 35(3/4): 191-202
[13]	Laroia R, Phamdo N, Farvardin N. Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vector Quantizers // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Toronto, Canada, 1991, Ⅰ: 641-644
[14]	Gardner W R, Rao B D. Theoretical Analysis of the High-Rate Vector Quantization of LPC Parameters. IEEE Trans on Speech and Audio Processing, 1995, 3(5): 367-381
[15]	Kim H K, Lee H S. Interlacing Properties of Line Spectrum Pair Frequencies. IEEE Trans on Speech and Audio Processing, 1999, 7(1): 87-91
[16]	Ling Zhenhua, Wu Yijian, Wang Yuping, et al. USTC System for Blizzard Challenge 2006 an Improved HMM-Based Speech Synthesis Method [EB/OL]. [2006-09-16]. http:// citeseerx.ist.psu.edu/viewdoc/downlood?doi=10.1.1.130.7143rep=rep1type=pdf

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133