OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2018

基于子空间学习和特征选择融合的语音情感识别
Joint subspace learning and feature selection method for speech emotion recognition

DOI: 10.16511/j.cnki.qhdxxb.2018.26.014

宋鹏,郑文明,赵力

Keywords: 特征选择,子空间学习,情感识别,
feature selection,subspace learning,emotion recognition

Full-Text Cite this paper Add to My Lib

Abstract:

传统语音情感识别主要基于单一情感数据库进行训练与测试。而实际情况中，训练语句和测试语句往往来源于不同的数据库，识别率较低。为此，该文提出一种基于子空间学习和特征选择融合的语音情感识别方法。通过采用回归方法来学习特征的子空间表示；同时，引入l2，1-范数用于特征的选择和最大均值差异（maximum mean discrepancy，MMD）来减少不同情感数据库间的特征差异，进行联合优化求解从而提取较为鲁棒的情感特征表示。在EMO-DB和eNTERFACE这2个公开情感数据库上进行实验评价，结果表明：该方法在跨库条件下具有较好的性能，比其他经典的迁移学习方法更加鲁棒高效。
Abstract：Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods.

References

[1]	HU H, XU M X, WU W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//Proceedings of 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, USA:IEEE, 2007:413-416.
[2]	DENG J, ZHANG Z X, EYBEN F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072.
[3]	SONG P, ZHENG W M, LIANG R Y. Speech emotion recognition based on sparse transfer learning method[J]. IEICE Transactions on Information and Systems, 2015, 98(7):1409-1412.
[4]	YAN S C, XU D, ZHANG B Y, et al. Graph embedding and extensions:A general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1):40-51.
[5]	NIE F P, HUANG H, CAI X, et al. Efficient and robustfeature selection via joint <i>l</i><sub>2,1</sub>-norms minimization[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada:NIPS, 2010:1813-1821.
[6]	BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech[C]//Proceedings of INTERSPEECH. Lisbon, Portugal:ISCA, 2005:1517-1520.
[7]	EYBEN F, W？LLMER M, SCHULLER B. Opensmile:The munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy:ACM, 2010:1459-1462.
[8]	SCHULLER B, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan:ISCA, 2010:2795-2798.
[9]	HE R, TAN T N, WANG L, et al. <i>l</i><sub>2,1</sub> regularized correntropy for robust feature selection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA:IEEE, 2012:2504-2511.
[10]	HAN K, YU D, TASHEV I. Speech emotion recognition using deep neural network and extreme learning machine[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014:223-227.
[11]	KINNUNEN T, LI H Z. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40.
[12]	El AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition:Features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3):572-587.
[13]	WEISS K, KHOSHGOFTAAR T M, WANG D D. A survey of transfer learning[J]. Journal of Big Data, 2016, 3(1):1-40.
[14]	MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, USA:IEEE, 2006:8-8.
[15]	韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50. HAN W J, LI H F, RUAN H B, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese).
[16]	ABDELWAHAB M, BUSSO C. Supervised domain adaptation for emotion recognition from speech[C]//Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane, Australia:IEEE, 2015:5058-5062.
[17]	HASSAN A, DAMPER R, NIRANJAN M. On acoustic emotion recognition:Compensating for covariate shift[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7):1458-1468.
[18]	SONG P, ZHENG W M, OU S F, et al. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization[J]. Speech Communication, 2016, 83:34-41.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于子空间学习和特征选择融合的语音情感识别Joint subspace learning and feature selection method for speech emotion recognition

基于子空间学习和特征选择融合的语音情感识别
Joint subspace learning and feature selection method for speech emotion recognition