全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于特征融合与多注意力机制的人脸表情识别方法
Facial Expression Recognition Method Based on Feature Fusion and Multi-Attention

DOI: 10.12677/csa.2025.152029, PP. 13-22

Keywords: 人脸表情识别,特征融合,多注意力
Facial Expression Recognition
, Feature Fusion, Multi-Attention

Full-Text   Cite this paper   Add to My Lib

Abstract:

当前,人脸表情识别(FER)领域正受到越来越多研究者的关注。然而,FER仍面临诸多挑战,特别是在真实场景(in-the-wild)数据集上,模型性能的提升受到显著限制。为了解决FER领域中身份信息干扰、类内差异和类间相似等问题,本文提出了一种基于特征融合与多注意力机制的人脸表情识别方法。该方法首先采用金字塔结构整合骨干网络提取的多层语义特征。然后,通过多注意力模块来强化模型全局与局部的学习能力,自适应地捕捉关键的人脸信息。最后,使用正交分解映射对融合特征进行分解,进一步去除特征间的冗余信息,尤其是身份干扰信息,从而获得具有高辨别性以及良好泛化能力的表情特征。我们所提出方法在RAF-DB、AffectNet数据集上实现了优秀的效果,大量实验证明了方法的有效性。
Currently, the field of Facial Expression Recognition (FER) is attracting increasing attention from researchers. However, FER still faces numerous challenges, particularly in enhancing model performance on real-world (in-the-wild) datasets. To address issues such as identity information interference, intra-class variation, and inter-class similarity in FER, this paper proposes a facial expression recognition method based on feature fusion and a multi-attention mechanism. The method first employs a pyramid structure to integrate multi-layer semantic features extracted by the backbone network. Then, it uses multi-attention modules to enhance the model’s ability to learn both global and local features, allowing for adaptive capture of key facial information. Finally, orthogonal decomposition mapping is applied to the fused features to further eliminate redundant information between features, especially identity interference, resulting in expression features with high discriminability and good generalization ability. Our proposed method achieves excellent results on the RAF-DB and AffectNet datasets, with extensive experiments demonstrating its effectiveness.

References

[1]  Ekman, P. (1971) Universals and Cultural Differences in Facial Expressions of Emotion. Nebraska Symposium on Motivation. Nebraska Symposium on Motivation, 19, 207-282.
[2]  Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110.
https://doi.org/10.1023/b:visi.0000029664.99615.94
[3]  Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, 20-25 June 2005, 886-893.
https://doi.org/10.1109/cvpr.2005.177
[4]  Hu, Y., Zeng, Z., Yin, L., Wei, X., Zhou, X. and Huang, T.S. (2008) Multi-View Facial Expression Recognition. 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, 17-19 September 2008, 1-6.
https://doi.org/10.1109/afgr.2008.4813445
[5]  Shan, C., Gong, S. and McOwan, P.W. (2009) Facial Expression Recognition Based on Local Binary Patterns: A Comprehensive Study. Image and Vision Computing, 27, 803-816.
https://doi.org/10.1016/j.imavis.2008.08.005
[6]  Berretti, S., Ben Amor, B., Daoudi, M. and del Bimbo, A. (2011) 3D Facial Expression Recognition Using SIFT Descriptors of Automatically Detected Keypoints. The Visual Computer, 27, 1021-1036.
https://doi.org/10.1007/s00371-011-0611-x
[7]  Moore, S. and Bowden, R. (2011) Local Binary Patterns for Multi-View Facial Expression Recognition. Computer Vision and Image Understanding, 115, 541-558.
https://doi.org/10.1016/j.cviu.2010.12.001
[8]  Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Li, F.-F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255.
https://doi.org/10.1109/cvpr.2009.5206848
[9]  Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[10]  Radford, A. and Narasimhan, K. (2018) Improving Language Understanding by Generative Pre-Training. Preprint.
[11]  Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2021) An Image is Worth 16 ×16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
https://doi.org/10.48550/arXiv.2010.11929
[12]  Wang, K., Peng, X., Yang, J., Lu, S. and Qiao, Y. (2020) Suppressing Uncertainties for Large-Scale Facial Expression Recognition. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 6896-6905.
https://doi.org/10.1109/cvpr42600.2020.00693
[13]  Vo, T., Lee, G., Yang, H. and Kim, S. (2020) Pyramid with Super Resolution for In-the-Wild Facial Expression Recognition. IEEE Access, 8, 131988-132001.
https://doi.org/10.1109/access.2020.3010018
[14]  Farzaneh, A.H. and Qi, X. (2021) Facial Expression Recognition in the Wild via Deep Attentive Center Loss. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 2401-2410.
https://doi.org/10.1109/wacv48630.2021.00245
[15]  Zhao, Z., Liu, Q. and Wang, S. (2021) Learning Deep Global Multi-Scale and Local Attention Features for Facial Expression Recognition in the Wild. IEEE Transactions on Image Processing, 30, 6544-6556.
https://doi.org/10.1109/tip.2021.3093397
[16]  Zhou, Y., Guo, L. and Jin, L. (2023) Quaternion Orthogonal Transformer for Facial Expression Recognition in the Wild. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 4-10 June 2023, 1-5.
https://doi.org/10.1109/icassp49357.2023.10096851
[17]  Xue, F., Wang, Q. and Guo, G. (2021) TransFER: Learning Relation-Aware Facial Expression Representations with Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 3581-3590.
https://doi.org/10.1109/iccv48922.2021.00358
[18]  Li, S., Deng, W. and Du, J. (2017) Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2584-2593.
https://doi.org/10.1109/cvpr.2017.277
[19]  Mollahosseini, A., Hasani, B. and Mahoor, M.H. (2019) AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing, 10, 18-31.
https://doi.org/10.1109/taffc.2017.2740923
[20]  Guo, Y., Zhang, L., Hu, Y., He, X. and Gao, J. (2016) MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. Computer Vision—ECCV 2016, Amsterdam, 11-14 October 2016, 87-102.
https://doi.org/10.1007/978-3-319-46487-9_6
[21]  Deng, J., Guo, J., Xue, N. and Zafeiriou, S. (2019) ArcFace: Additive Angular Margin Loss for Deep Face Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4685-4694.
https://doi.org/10.1109/cvpr.2019.00482
[22]  Xie, X., Zhou, P., Li, H., Lin, Z. and Yan, S. (2022) Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. arXiv: 2208.06677.
https://doi.org/10.48550/arXiv.2208.06677
[23]  Wang, K., Peng, X., Yang, J., Meng, D. and Qiao, Y. (2020) Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition. IEEE Transactions on Image Processing, 29, 4057-4069.
https://doi.org/10.1109/tip.2019.2956143
[24]  Li, H., Wang, N., Ding, X., Yang, X. and Gao, X. (2021) Adaptively Learning Facial Expression Representation via C-F Labels and Distillation. IEEE Transactions on Image Processing, 30, 2016-2028.
https://doi.org/10.1109/tip.2021.3049955
[25]  Ma, F., Sun, B. and Li, S. (2023) Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Transactions on Affective Computing, 14, 1236-1248.
https://doi.org/10.1109/taffc.2021.3122146
[26]  Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F. and Tang, B. (2022) Face2Exp: Combating Data Biases for Facial Expression Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 20259-20268.
https://doi.org/10.1109/cvpr52688.2022.01965
[27]  She, J., Hu, Y., Shi, H., Wang, J., Shen, Q. and Mei, T. (2021) Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 6244-6253.
https://doi.org/10.1109/cvpr46437.2021.00618
[28]  Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C. and Wang, H. (2021) Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition. arXiv: 2104.05160
https://doi.org/10.48550/arXiv.2104.05160
[29]  Wen, Z., Lin, W., Wang, T. and Xu, G. (2023) Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition. Biomimetics, 8, Article 199.
https://doi.org/10.3390/biomimetics8020199
[30]  Zhang, Y., Wang, C., Ling, X. and Deng, W. (2022) Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition. Computer Vision—ECCV 2022, Tel Aviv, 23-27 October 2022, 418-434.
https://doi.org/10.1007/978-3-031-19809-0_24
[31]  Gera, D., Raj Kumar, B.V., Badveeti, N.S.K. and Balasubramanian, S. (2023) Dynamic Adaptive Threshold Based Learning for Noisy Annotations Robust Facial Expression Recognition. Multimedia Tools and Applications, 83, 49537-49566.
https://doi.org/10.1007/s11042-023-17510-3
[32]  Shi, J., Zhu, S. and Liang, Z. (2021) Learning to Amend Facial Expression Representation via De-albino and Affinity.arXiv: 2103.10189
https://doi.org/10.48550/arXiv.2103.10189

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133