OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Artificial Intelligence and Robotics Research 2025

基于多注意力特征融合的人群计数方法
A Crowd Counting Method Based on Multi-Attention Feature Fusion

DOI: 10.12677/airr.2025.143052, PP. 527-535

李俊恩, 谭显洋, 陆许明

Keywords: 人群计数，注意力机制，多尺度特征，人群密度估计
Crowd Counting, Attention Mechanism, Multi-Scale Features, Crowd Density Estimation

Full-Text Cite this paper Add to My Lib

Abstract:

针对拥挤环境下人群分布的高度不均匀、复杂背景的干扰以及遮挡问题，本文提出一种基于MobileNet V3分类模型的特征融合网络。首先从MobileNet V3网络中提取四个不同尺寸的特征图，并对每个特征图进行HAM模块操作，该模块由通道、边缘、空间注意力以及动态卷积组成，特征图通过上采样对齐分辨率，并在通道维度上拼接成综合特征图，经过1 × 1卷积压缩通道，生成最终的融合特征图用于生成密度图，完成高精度人群计数任务。该方法在ShanghaiTech、NWPU和QNRF三个具有挑战的数据集上进行了实验验证，实验结果表明，所提出的方法在计数精度和鲁棒性方面显著优于现有主流方法。
To address the challenges of highly non-uniform crowd distribution, complex background interference, and severe occlusions in crowded environments, this paper proposes a feature fusion network based on the MobileNet V3 classification model. The framework first extracts four multi-scale feature maps from the MobileNet V3 backbone. Each feature map undergoes processing through a Hybrid Attention Module (HAM), which integrates channel attention, edge attention, spatial attention, and dynamic convolution operations. The processed features are then upsampled to align their spatial resolutions, concatenated along the channel dimension, and compressed via a 1 × 1 convolutional layer to generate a unified fused feature map. This fused representation is subsequently used to regress high-precision density maps for accurate crowd counting. The method is experimentally validated on three challenging datasets, namely ShanghaiTech, NWPU and QNRF, and the experimental results show that the proposed method significantly outperforms state-of-the-art approaches in both counting accuracy and robustness.

References

[1]	Dollar, P., Wojek, C., Schiele, B. and Perona, P. (2012) Pedestrian Detection: An Evaluation of the State of the Art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 743-761. https://doi.org/10.1109/tpami.2011.155
[2]	Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, 20-25 June 2005, 886-893. https://doi.org/10.1109/cvpr.2005.177
[3]	Felzenszwalb, P.F., Girshick, R.B., McAllester, D. and Ramanan, D. (2010) Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627-1645. https://doi.org/10.1109/tpami.2009.167
[4]	Xu, C.Y. (2014) Research on Automated Cervical Cytological Smears Interpretation Method. Ph.D. Thesis, Chongqing University.
[5]	Maitra, M., Kumar Gupta, R. and Mukherjee, M. (2012) Detection and Counting of Red Blood Cells in Blood Cell Images Using Hough Transform. International Journal of Computer Applications, 53, 13-17. https://doi.org/10.5120/8505-2274
[6]	Oñoro-Rubio, D. and López-Sastre, R.J. (2016) Towards Perspective-Free Object Counting with Deep Learning. Computer Vision—ECCV 2016 14th European Conference, Amsterdam, 11-14 October 2016, 615-629. https://doi.org/10.1007/978-3-319-46478-7_38
[7]	Enzweiler, M. and Gavrila, D.M. (2009) Monocular Pedestrian Detection: Survey and Experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2179-2195. https://doi.org/10.1109/tpami.2008.260
[8]	Li, Y., Zhang, X. and Chen, D. (2018) CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1091-1100. https://doi.org/10.1109/cvpr.2018.00120
[9]	Cao, X., Wang, Z., Zhao, Y. and Su, F. (2018) Scale Aggregation Network for Accurate and Efficient Crowd Counting. Computer Vision—ECCV 2018 15th European Conference, Munich, 8-14 September 2018, 757-773. https://doi.org/10.1007/978-3-030-01228-1_45
[10]	Liu, W., Salzmann, M. and Fua, P. (2019) Context-Aware Crowd Counting. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5099-5108. https://doi.org/10.1109/cvpr.2019.00524
[11]	Gao, J., Wang, Q. and Li, X. (2020) PCC Net: Perspective Crowd Counting via Spatial Convolutional Network. IEEE Transactions on Circuits and Systems for Video Technology, 30, 3486-3498. https://doi.org/10.1109/tcsvt.2019.2919139
[12]	Yi, J., Shen, Z., Chen, F., Zhao, Y., Xiao, S. and Zhou, W. (2023) A Lightweight Multiscale Feature Fusion Network for Remote Sensing Object Counting. IEEE Transactions on Geoscience and Remote Sensing, 61, Article ID: 5902113. https://doi.org/10.1109/tgrs.2023.3238185
[13]	Shu, W., Wan, J., Tan, K.C., Kwong, S. and Chan, A.B. (2022) Crowd Counting in the Frequency Domain. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 19618-19627.
[14]	Liang, D., Xu, W. and Bai, X. (2022) An End-to-End Transformer Model for Crowd Localization. Computer Vision—ECCV 2022 17th European Conference, Tel Aviv, 23-27 October 2022, 38-54. https://doi.org/10.1007/978-3-031-19769-7_3
[15]	Han, T., Bai, L., Liu, L. and Ouyang, W. (2023) Steerer: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, 2-3 October 2023, 21848-21859.
[16]	Tian, Y., Chu, X. and Wang, H. (2021) Cctrans: Simplifying and Improving Crowd Counting with Transformer. arXiv: 2109.14483.
[17]	Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L., Tan, M., et al. (2019) Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1314-1324. https://doi.org/10.1109/iccv.2019.00140
[18]	Li, C., Zhou, A. and Yao, A. (2022) Omni-Dimensional Dynamic Convolution. arXiv: 2209.07947.
[19]	Hu, J., Shen, L., Albanie, S., Sun, G. and Wu, E. (2020) Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023. https://doi.org/10.1109/tpami.2019.2913372
[20]	Wang, X., Girshick, R., Gupta, A. and He, K. (2018) Non-Local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7794-7803. https://doi.org/10.1109/cvpr.2018.00813
[21]	Chen, L., Zhang, X. and Yang, Y. (2019) Edge Attention for Visual Question Answering. arXiv: 1911.12294. https://doi.org/10.48550/arXiv.1911.12294
[22]	Wang, B., Liu, H., Samaras, D. and Nguyen, M.H. (2020) Distribution Matching for Crowd Counting. Advances in Neural Information Processing Systems, Vol. 33, 1595-1607.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于多注意力特征融合的人群计数方法A Crowd Counting Method Based on Multi-Attention Feature Fusion

基于多注意力特征融合的人群计数方法
A Crowd Counting Method Based on Multi-Attention Feature Fusion