全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于分层距离感知对比学习的多模态情绪分析
Multimodal Sentiment Analysis Based on Hierarchical Distance-Aware Contrastive Learning

DOI: 10.12677/csa.2025.155134, PP. 615-623

Keywords: 多模态情感分析,跨模态注意力机制,对比学习
Multimodal Sentiment Analysis
, Cross-Modal Attention Mechanism, Contrastive Learning

Full-Text   Cite this paper   Add to My Lib

Abstract:

多模态情感分析(multimodal sentiment analysis, MSA)利用视觉、文本和音频等模态数据来提升情感分析的准确性。尽管多模态信息能够提供更丰富的语境,但如何有效地处理异构模态数据之间的交互与融合仍然是一个重要挑战。为了解决这一问题,本文提出了一种基于分层距离感知对比学习(hierarchical distance-aware contrastive learning, HDACL)的多模态情感分析方法。具体而言,HDACL通过引入跨模态注意力机制,实现了不同模态数据之间的充分交互。与此同时,我们设计了一种基于情感强度距离差异引导的对比学习策略,进一步增强了多模态数据的一致性对齐。在CMU-MOSI多模态情感分析数据集上进行验证,实验结果表明,HDACL方法在Acc-2和Acc-7指标上分别取得了0.7%和0.8%的性能提升。
Multimodal sentiment analysis (MSA) utilizes visual, textual, and audio data to improve the accuracy of sentiment analysis. Although multimodal information can provide richer context, how to effectively handle the interaction and fusion across heterogeneous multimodal data remains an important challenge. To this end, this paper proposes a multimodal sentiment analysis method based on hierarchical distance-aware contrastive learning (HDACL). Specifically, HDACL achieves full interaction across different modal data by introducing a cross-modal attention mechanism. Meanwhile we design a contrastive learning strategy guided by the difference in sentiment intensity distance to further enhance the consistency alignment of multimodal data. The method was validated on the CMU-MOSI multimodal sentiment analysis dataset. Experimental results show that the HDACL method achieved 0.7% and 0.8% performance improvements on the Acc-2 and Acc-7 indicators, respectively.

References

[1]  Islam, M.S., Kabir, M.N., Ghani, N.A., Zamli, K.Z., Zulkifli, N.S.A., Rahman, M.M., et al. (2024) Challenges and Future in Deep Learning for Sentiment Analysis: A Comprehensive Review and a Proposed Novel Hybrid Approach. Artificial Intelligence Review, 57, Article No. 62.
https://doi.org/10.1007/s10462-023-10651-9
[2]  Poria, S., Hazarika, D., Majumder, N. and Mihalcea, R. (2023) Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research. IEEE Transactions on Affective Computing, 14, 108-132.
https://doi.org/10.1109/taffc.2020.3038167
[3]  Poria, S., Cambria, E., Bajpai, R. and Hussain, A. (2017) A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion. Information Fusion, 37, 98-125.
https://doi.org/10.1016/j.inffus.2017.02.003
[4]  Somandepalli, K., Guha, T., Martinez, V.R., Kumar, N., Adam, H. and Narayanan, S. (2021) Computational Media Intelligence: Human-Centered Machine Analysis of Media. Proceedings of the IEEE, 109, 891-910.
https://doi.org/10.1109/jproc.2020.3047978
[5]  彭李湘松, 张著洪. 基于三角形特征融合与感知注意力的方面级情感分析[J]. 计算机工程, 2025: 1-10.
https://doi.org/10.19678/j.issn.1000-3428.0070397, 2025-03-25.
[6]  Fan, C., Zhu, K., Tao, J., Yi, G., Xue, J. and Lv, Z. (2025) Multi-Level Contrastive Learning: Hierarchical Alleviation of Heterogeneity in Multimodal Sentiment Analysis. IEEE Transactions on Affective Computing, 16, 207-222.
https://doi.org/10.1109/taffc.2024.3423671
[7]  Tsai, Y.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L. and Salakhutdinov, R. (2019) Multimodal Transformer for Unaligned Multimodal Language Sequences. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 6558-6569.
https://doi.org/10.18653/v1/p19-1656
[8]  Li, Y., Wang, Y. and Cui, Z. (2023) Decoupled Multimodal Distilling for Emotion Recognition. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 6631-6640.
https://doi.org/10.1109/cvpr52729.2023.00641
[9]  Han, W., Chen, H. and Poria, S. (2021) Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7-11 November 2021, 9180-9192.
https://doi.org/10.18653/v1/2021.emnlp-main.723
[10]  Wang, D., Liu, S., Wang, Q., Tian, Y., He, L. and Gao, X. (2023) Cross-Modal Enhancement Network for Multimodal Sentiment Analysis. IEEE Transactions on Multimedia, 25, 4909-4921.
https://doi.org/10.1109/tmm.2022.3183830
[11]  Taboada, M., Brooke, J., Tofiloski, M., Voll, K. and Stede, M. (2011) Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37, 267-307.
https://doi.org/10.1162/coli_a_00049
[12]  吴杰胜, 陆奎. 基于多部情感词典和规则集的中文微博情感分析研究[J]. 计算机应用与软件, 2019, 36(9): 93-99.
[13]  Chang, C. and Lin, C. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2, 1-27.
https://doi.org/10.1145/1961189.1961199
[14]  王彬, 蒋鸿玲, 吴槟. 基于Attention-Bi-LSTM的微博评论情感分析研究[J]. 计算机科学与应用, 2020, 10(12): 2380-2387.
[15]  Naseem, U., Razzak, I., Musial, K. and Imran, M. (2020) Transformer Based Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis. Future Generation Computer Systems, 113, 58-69.
https://doi.org/10.1016/j.future.2020.06.050
[16]  Fasel, B. and Luettin, J. (2003) Automatic Facial Expression Analysis: A Survey. Pattern Recognition, 36, 259-275.
https://doi.org/10.1016/s0031-3203(02)00052-3
[17]  Li, J., Zhang, D., Zhang, J., Zhang, J., Li, T., Xia, Y., et al. (2017) Facial Expression Recognition with Faster R-CNN. Procedia Computer Science, 107, 135-140.
https://doi.org/10.1016/j.procs.2017.03.069
[18]  Nancy, A.M., Kumar, G.S., Doshi, P. and Shaw, S. (2018) Audio Based Emotion Recognition Using Mel Frequency Cepstral Coefficient and Support Vector Machine. Journal of Computational and Theoretical Nanoscience, 15, 2255-2258.
https://doi.org/10.1166/jctn.2018.7447
[19]  Koolagudi, S.G. and Rao, K.S. (2012) Emotion Recognition from Speech: A Review. International Journal of Speech Technology, 15, 99-117.
https://doi.org/10.1007/s10772-011-9125-1
[20]  Zadeh, A., Chen, M., Poria, S., Cambria, E. and Morency, L. (2017) Tensor Fusion Network for Multimodal Sentiment Analysis. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, 9-11 September 2017, 1103-1114.
https://doi.org/10.18653/v1/d17-1115
[21]  He, L., Wang, Z., Wang, L. and Li, F. (2023) Multimodal Mutual Attention-Based Sentiment Analysis Framework Adapted to Complicated Contexts. IEEE Transactions on Circuits and Systems for Video Technology, 33, 7131-7143.
https://doi.org/10.1109/tcsvt.2023.3276075
[22]  Hazarika, D., Zimmermann, R. and Poria, S. (2020) MISA: Modality-Invariant and-Specific Representations for Multimodal Sentiment Analysis. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2020, 1122-1131.
https://doi.org/10.1145/3394171.3413678
[23]  Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., et al. (2020) Transformers: State-Of-The-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 16-20 November 2020, 38-45.
https://doi.org/10.18653/v1/2020.emnlp-demos.6
[24]  Zadeh, A., Zellers, R., Pincus, E. and Morency, L.P. (2016) MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos. arXiv: 1606. 06259.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133