|
- 2018
基于子空间学习和特征选择融合的语音情感识别
|
Abstract:
传统语音情感识别主要基于单一情感数据库进行训练与测试。而实际情况中,训练语句和测试语句往往来源于不同的数据库,识别率较低。为此,该文提出一种基于子空间学习和特征选择融合的语音情感识别方法。通过采用回归方法来学习特征的子空间表示;同时,引入l2,1-范数用于特征的选择和最大均值差异(maximum mean discrepancy,MMD)来减少不同情感数据库间的特征差异,进行联合优化求解从而提取较为鲁棒的情感特征表示。在EMO-DB和eNTERFACE这2个公开情感数据库上进行实验评价,结果表明:该方法在跨库条件下具有较好的性能,比其他经典的迁移学习方法更加鲁棒高效。
Abstract:Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods.
[1] | HU H, XU M X, WU W. GMM supervector based SVM with spectral features for speech emotion recognition[C]//Proceedings of 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, USA:IEEE, 2007:413-416. |
[2] | DENG J, ZHANG Z X, EYBEN F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9):1068-1072. |
[3] | SONG P, ZHENG W M, LIANG R Y. Speech emotion recognition based on sparse transfer learning method[J]. IEICE Transactions on Information and Systems, 2015, 98(7):1409-1412. |
[4] | YAN S C, XU D, ZHANG B Y, et al. Graph embedding and extensions:A general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1):40-51. |
[5] | NIE F P, HUANG H, CAI X, et al. Efficient and robustfeature selection via joint <i>l</i><sub>2,1</sub>-norms minimization[C]//Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada:NIPS, 2010:1813-1821. |
[6] | BURKHARDT F, PAESCHKE A, ROLFES M, et al. A database of German emotional speech[C]//Proceedings of INTERSPEECH. Lisbon, Portugal:ISCA, 2005:1517-1520. |
[7] | EYBEN F, W?LLMER M, SCHULLER B. Opensmile:The munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy:ACM, 2010:1459-1462. |
[8] | SCHULLER B, STEIDL S, BATLINER A, et al. The INTERSPEECH 2010 paralinguistic challenge[C]//Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Japan:ISCA, 2010:2795-2798. |
[9] | HE R, TAN T N, WANG L, et al. <i>l</i><sub>2,1</sub> regularized correntropy for robust feature selection[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA:IEEE, 2012:2504-2511. |
[10] | HAN K, YU D, TASHEV I. Speech emotion recognition using deep neural network and extreme learning machine[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014:223-227. |
[11] | KINNUNEN T, LI H Z. An overview of text-independent speaker recognition:From features to supervectors[J]. Speech Communication, 2010, 52(1):12-40. |
[12] | El AYADI M, KAMEL M S, KARRAY F. Survey on speech emotion recognition:Features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3):572-587. |
[13] | WEISS K, KHOSHGOFTAAR T M, WANG D D. A survey of transfer learning[J]. Journal of Big Data, 2016, 3(1):1-40. |
[14] | MARTIN O, KOTSIA I, MACQ B, et al. The eNTERFACE'05 audio-visual emotion database[C]//Proceedings of the 22nd International Conference on Data Engineering Workshops. Atlanta, USA:IEEE, 2006:8-8. |
[15] | 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1):37-50. HAN W J, LI H F, RUAN H B, et al. Review on speech emotion recognition[J]. Journal of Software, 2014, 25(1):37-50. (in Chinese). |
[16] | ABDELWAHAB M, BUSSO C. Supervised domain adaptation for emotion recognition from speech[C]//Proceedings of 2015 International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brisbane, Australia:IEEE, 2015:5058-5062. |
[17] | HASSAN A, DAMPER R, NIRANJAN M. On acoustic emotion recognition:Compensating for covariate shift[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(7):1458-1468. |
[18] | SONG P, ZHENG W M, OU S F, et al. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization[J]. Speech Communication, 2016, 83:34-41. |