全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

改进的跨语种语音合成模型自适应方法

, PP. 457-463

Keywords: 隐马尔科夫模型(HMM),语音合成,跨语种模型自适应,音素映射

Full-Text   Cite this paper   Add to My Lib

Abstract:

统计参数语音合成中的跨语种模型自适应主要应用于目标说话人语种与源模型语种不同时,使用目标发音人少量语音数据快速构建具有其音色特征的源模型语种合成系统。本文对传统的基于音素映射和三音素模型的跨语种自适应方法进行改进,一方面通过结合数据挑选的音素映射方法以提高音素映射的可靠性,另一方面引入跨语种的韵律信息映射以弥补原有方法中三音素模型在韵律表征上的不足。在中英文跨语种模型自适应系统上的实验结果表明,改进后系统合成语音的自然度与相似度相对传统方法都有了明显提升。

References

[1]  Tokuda K,Zen H,Black A W.HMM-Based Approach to Multilingual Speech Synthesis // Narayanan S,Alwan A,eds.Text to Speech Synthesis: New Paradigms and Advances.Upper Saddle River,USA: Prentice-Hall,2004: 135-153
[2]  Leggetter C J,Woodland P C.Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models.Computer Speech and Language,1995,9(2): 171-185
[3]  Latorre J,Iwano K,Furui S.New Approach to the Polyglot Speech Generation by Means of an HMM-Based Speaker Adaptable Synthesizer.Speech Communication,2006: 48(10): 1227-1242
[4]  Wu Y,Nankaku Y,Tokuda K.State Mapping Based Method for Cross-Lingual Speaker Adaptation in HMM-Based Speech Synthesis // Proc of the 10th Annual Conference of the International Speech Communication Association.Brighton,UK,2009: 528-531
[5]  Gibson M,Hirsimaki T,Karhila R,et al.Unsupervised Cross-Lingual Speaker Adaptation for HMM-Based speech Synthesis Using Two-Pass Decision Tree Construction // Proc of the IEEE International Conference on Acoustics Speech and Signal Processing.Dallas,USA,2010: 4641-4645
[6]  Wu Y,King S,Tokuda K.Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis // Proc of the International Symposium on Chinese Spoken Language.Kunming,China,2008: 9-12
[7]  Gales M J F.The Generation and Use of Regression Class Trees for MLLR Adaptation.Technical Report,CUED/F-INFENG/TR263.Engineering Department,Cambridge University.Cambridge,UK,1996
[8]  International Phonetic Association.Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet.London,UK:Cambridge University Press,1999
[9]  Kawahara H,Masuda-Katsuse I,deCheveigne A.Restructuring Speech Representations Using A Pitch-Adaptive Time-Frequency Smoothing and an Instanta-Neous-Frequency-Based F0 Extraction: Possible Role of a Repetitive Structure in Sounds.Speech Communication,1999,27(3/4): 187-207

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133