%0 Journal Article %T Text-To-Visual Speech in Chinese Based on Data-Driven Approach
基于数据驱动方法的汉语文本-可视语音合成 %A WANG Zhi-Ming %A CAI Lian-Hong %A AI Hai-Zhou %A
王志明 %A 蔡莲红 %A 艾海舟 %J 软件学报 %D 2005 %I %X Text-To-Visual speech (TTVS) synthesis by computer can increase the speech intelligibility and make the human-computer interaction interfaces more friendly. This paper describes a Chinese text-to-visual speech synthesis system based on data-driven (sample based) approach, which is realized by short video segments concatenation. An effective method to construct two visual confusion trees for Chinese initials and finals is developed. A co-articulation model based on visual distance and hardness factor is proposed, which can be used in the recording corpus sentence selection in analysis phase and the unit selection in synthesis phase. The obvious difference between bound ary images of the concatenation video segments is smoothed by image morphing technique. By combining with the acoustic Text-To-Speech (TTS) synthesis, a Chinese text-to-visual speech synthesis system is realized. %K text-to-speech (TTS) %K text-to-visual speech (TTVS) %K viseme %K co-articulation
文-语转换系统(TTS) %K 文本-可视语音合成系统(TTVS) %K 视位 %K 协同发音 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=EB1E1486E7E6BC73&yid=2DD7160C83D0ACED&vid=7801E6FC5AE9020C&iid=B31275AF3241DB2D&sid=1A775ABEB80E436B&eid=EE05CC1F800E4629&journal_id=1000-9825&journal_name=软件学报&referenced_num=6&reference_num=33