|
- 2017
基于局部特征优化的语音情感识别
|
Abstract:
摘要 情感识别在人机交互领域具有广阔前景。由于情感表达在时间上具有一定的持续性,统计特征更能体现不同情绪语音的差异和动态变化,大多数语音情感识别研究都使用全局特征(如最大值、最小值等),并没有充分挖掘局部特征(如单帧的短时能量、过零率等)中的信息。提出一种基于局部特征优化的方法,对每个情感语音样本做进一步提纯,通过聚类分析对情感特征相对不显著的帧进行过滤,在此基础上进行统计计算和分类,以提高预测的准确率。实验结果表明,基于优化后的样本进行情感分类,3个语料库的平均准确率提高5%~17%。进一步的研究发现这种优化方法可能更适合于语音长度较长的情感识别场景。
[1] | 韩文静,李海峰,阮华斌,等.语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50. |
[2] | Gupta P, Rajput N. Two-stream emotion recognition for call center monitoring [C]//Proceedings of International Conference on Spoken Language Processing(Interspeech)2007.Antwerp: International Speech Communication Association (ISCA), 2007: 2 241-2 244. |
[3] | El Ayadi M, Kamel M S, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases [J]. Pattern Recognition, 2011, 44 (3): 572-587. |
[4] | Vogt T, André E. Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition [C]//Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2005). Amsterdam: IEEE, 2005: 474-477. |
[5] | Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech [C]//Proceedings of Interspeech 2005. Lisbon: ISCA, 2005: 1 517-1 520. |
[6] | Eyben F, Batliner A, Schuller B, et al. Cross-Corpus classification of realistic emotions: some pilot experiments [C]//Proc. 3rd International Workshop on Emotion (satellite of LREC). Valletta: The Association for the Advancement of Affective Computing, 2010: 77-82. |
[7] | 韩文静, 李海峰, 韩纪庆. 基于长短时特征融合的语音情感识别方法[J]. 清华大学学报(自然科学版),2008, 48(S1): 708-714. |
[8] | Eyben F, Weninger F, Gross F, et al. Recent developments in opensmile, the munich open-source multimedia feature extractor[C]//Proceedings of the 21st ACM international conference on Multimedia. Barcelona: Association for Computing Machinery (ACM), 2013: 835-838. |
[9] | Kwon O W, Chan K, Hao J, et al. Emotion recognition by speech signals [C]//Proceedings of Interspeech 2003. Geneva: ISCA, 2003:125-128. |
[10] | 韩一,王国胤,杨勇. 基于MFCC的语音情感识别[J]. 重庆邮电大学学报(自然科学版),2008(5):597-602. |
[11] | Hall M, Frank E, Holmes G, et al. The WEKA data mining software: an update [J]. ACM SIGKDD explorations newsletter, 2009, 11(1): 10-18. |
[12] | Schuller B, Rigoll G. Timing levels in segment-based speech emotion recognition [C]//Proceedings of Interspeech 2006. Pittsburgh: ISCA, 2006:1 818-1 822. |
[13] | 林奕琳,韦岗.基于短时和长时特征的语音情感识别研究[J]. 科学技术与工程, 2006, 6(4):450-454. |
[14] | Kim E H, Hyun K H, Kim S H, et al. Speech ermotion recognition separately from voiced and unvoiced sound for emotional interaction robot [C]//International Conference on Control, Automation and Systems 2008. Seoul: IEEE, 2008: 2 014-2 019. |
[15] | Rao K S, Koolagudi S G, Vempada R R. Emotion recognition from speech using global and local prosodic features [J]. International Journal of Speech Technology, 2013, 16(2): 143-160. |
[16] | Bhaykar M, Yadav J, Rao K S. Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM [C]//Proceedings of National Conference on Communications (NCC) 2013.New Delhi: IEEE, 2013: 1-5. |
[17] | 叶吉祥,张密霞,龚希龄,等.基于MF-DFA的语音情感识别 [J]. 长沙理工大学学报(自然科学版),2011,8(2):67-71. |
[18] | Livingstone S R, Peck K, Russo F A. Ravdess: the ryerson audio-visual database of emotional speech and song [C]//Proceedings of the 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS). Kingston:CSBBCS, 2012:71-72. |
[19] | 余伶俐,蔡自兴,陈明义,等.语音信号的情感特征分析与识别研究综述[J]. 电路与系统学报, 2007, 12(4):76-84. |
[20] | 蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J]. 清华大学学报(自然科学版), 2006, 46(1):86-89. |
[21] | Petrushin V A. Emotion recognition in speech signal: experimental study, development, and application [C]//Proceedings of Interspeech 2000. Beijing: ISCA, 2000:222-225. |
[22] | Chang C C, Lin C J. LIBSVM: a library for support vector machines [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 75-102. |
[23] | 朱菊霞,吴小培,吕钊,等.基于SVM的语音情感识别算法 [J]. 计算机系统应用,2011,20(5):87-91. |
[24] | Zhang B, Essl G, Provost E M. Recognizing emotion from singing and speaking using shared models [C]//Proceedings of Affective Computing and Intelligent Interaction (ACⅡ) 2015.London: IEEE, 2015: 139-145. |