|
基于改进Self-MM模型的多模态情感分析
|
Abstract:
早期情感分析依托于神经网络在文本、图像或者音频等单个模态做情感分析,虽然在各自模态已经有了不错的效果,但是仅仅通过单模态做情感分析无法充分表达人们的情感,所以本文结合多个模态的信息应用于情感分析领域。该领域中Self-MM模型已经有了较好的实验效果,但是该模型在优化器层面还有提升的空间,本文在此基础上继续做研究,采用更先进的AdamW优化器,在公开数据集CMU-MOSI进行验证,实验结果在Acc-7、Acc-2两个分类精度上分别有0.12%和0.43%的提升。
Early sentiment analysis relies on neural networks to do sentiment analysis in individual modali-ties such as text, image or audio, and although there have been good results in each modality, it is not possible to fully express people’s emotions by only doing sentiment analysis in a single modality, so this paper combines information from multiple modalities to apply to the field of sentiment analysis. The Self-MM model in this field has had good experimental results, but the model has room for improvement at the optimizer level. This paper continues to do research on this basis using the more advanced AdamW optimizer, and validates it in the public data set CMU-MOSI, and the experimental results have an improvement of 0.12% and 0.43% in the classification accuracy of Acc-7 and Acc-2, respectively.
[1] | 张亚洲, 戎璐, 宋大为, 等. 多模态情感分析研究综述[J]. 模式识别与人工智能, 2020, 33(5): 426-438. |
[2] | Soleymani, M., Garcia, D., Jou, B., et al. (2017) A Survey of Multimodal Sentiment Analysis. Image and Vision Computing, 65, 3-14. https://doi.org/10.1016/j.imavis.2017.08.003 |
[3] | Morency, L.P., Mihalcea, R. and Doshi, P. (2011) Towards Multimodal Senti-ment Analysis: Harvesting Opinions from the Web. Proceedings of the 13th International Conference on Multimodal Interfaces, Ali-cante, 14-18 November 2011, 169-176. https://doi.org/10.1145/2070481.2070509 |
[4] | Yu, W., Xu, H., Yuan, Z., et al. (2021) Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis. Proceed-ings of the AAAI Conference on Artificial Intelligence, 35, 10790-10797. https://doi.org/10.1609/aaai.v35i12.17289 |
[5] | Strapparava, C. and Valitutti, A. (2004) WordNet-Affect: An Affective Extension of WordNet. International Conference on Language Resources and Evaluation, Vol. 4, 1083-1086. |
[6] | Chang, C.C. and Lin, C.J. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 1-27. https://doi.org/10.1145/1961189.1961199 |
[7] | Pang, B. and Lee, L. (2008) Opinion Mining and Sentiment Analysis. Founda-tions and Trends? in Information Retrieval, 2, 1-135. https://doi.org/10.1561/1500000011 |
[8] | LeCun, Y., Bottou, L., Bengio, Y., et al. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791 |
[9] | Shi, X., Chen, Z., Wang, H., et al. (2015) Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 802-810. |
[10] | Luo, Z., Xu, H. and Chen, F. (2019) Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. AffCon@ AAAI, 80-87. https://doi.org/10.29007/7mhj |
[11] | Breuer, R. and Kimmel, R. (2017) A Deep Learning Perspective on the Origin of Facial Expressions. ArXiv: 1705.01842. |
[12] | Hasani, B. and Mahoor, M.H. (2017) Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 21-26 July 2017, 30-40. https://doi.org/10.1109/CVPRW.2017.282 |
[13] | Poria, S., Cambria, E., Bajpai, R. and Hussain, A. (2017) A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion. Information Fusion, 37, 98-125. https://doi.org/10.1016/j.inffus.2017.02.003 |
[14] | Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E. and Morency, L.P. (2018) Multi-Attention Recurrent Network for Human Communication Comprehension. 32nd AAAI Conference on Artificial Intelligence (AAAI-2018), New Orleans, 2-7 February 2018, 5642-5649. https://doi.org/10.1609/aaai.v32i1.12024 |
[15] | Sun, J., Yin, H., Tian, Y., et al. (2021) Two-Level Multimodal Fusion for Sentiment Analysis in Public Security. Security and Communication Networks, 2021, Article ID: 6662337. https://doi.org/10.1155/2021/6662337 |
[16] | Chen, X., Liang, C., Huang, D., et al. (2023) Symbolic Discovery of Optimization Algorithms. ArXiv: 2302.06675. |
[17] | Xie, X., Zhou, P., Li, H., et al. (2022) Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. ArXiv: 2208.06677. |
[18] | Kingma, D.P. and Ba, J. (2014) Adam: A Method for Stochastic Optimization. ArXiv: 1412.6980. |
[19] | Leit?o, P.J., Schwieder, M. and Senf, C. (2017) sgdm: An R Package for Performing Sparse Generalized Dissimilarity Modelling with Tools for Gdm. ISPRS International Journal of Geo-Information, 6, Arti-cle 23.
https://doi.org/10.3390/ijgi6010023 |
[20] | Dauphin, Y., De Vries, H. and Bengio, Y. (2015) Equilibrated Adaptive Learning Rates for Non-Convex Optimization. Advances in Neural Information Processing Systems, 28, 1504-1512. |
[21] | Loshchilov, I. and Hutter, F. (2017) Decoupled Weight Decay Regularization. ArXiv: 1711.05101. |
[22] | Devlin, J., Chang, M.W., Lee, K., et al. (2018) Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. ArXiv: 1810.04805. |
[23] | Staudemeyer, R.C. and Mor-ris, E.R. (2019) Understanding LSTM—A Tutorial into Long Short-Term Memory Recurrent Neural Networks. ArXiv: 1909.09586. |
[24] | Zadeh, A., Chen, M., Poria, S., et al. (2017) Tensor Fusion Network for Multimodal Sentiment Analysis. Proceed-ings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017, 1103-1114.
https://doi.org/10.18653/v1/D17-1115 |