|
YOLO-Vortex:基于漩涡聚合网络的水下目标检测模型
|
Abstract:
水下目标检测在海洋探索、生态保护和水下机器人导航等领域具有重要应用。然而,由于水下环境的复杂性,如光照不均匀、悬浮颗粒干扰和低对比度图像,传统的目标检测方法在水下环境中的表现往往不尽如人意,尤其是面对数据中的噪声问题。为了解决这一问题,本研究提出了一种基于YOLOv7的改进模型用于水下目标检测。我们将YOLOv7作为基线模型,针对其在水下环境中的不足之处,对模型的关键模块进行了优化。具体而言,我们提出了一种漩涡聚合网络模块来破坏噪声数据,并在此过程前引入了空间注意力机制,帮助网络更好地关注重要特征,并抑制不相关的噪声;针对下采样过程中可能存在的信息丢失问题,我们提出了空间到深度池化模块(STD-MP),通过将空间特征转换为深度特征,结合最大池化操作完成下采样过程;最后,我们对损失函数进行了优化。实验结果表明,我们的模型相比于基准模型提升了4.2%的mAP。
Underwater object detection has important applications in fields such as ocean exploration, ecological protection, and underwater robotics navigation. However, due to the complexity of the underwater environment, including uneven lighting, interference from suspended particles, and low-contrast images, traditional object detection methods often perform suboptimally in underwater scenarios, particularly when dealing with noisy data. To address this issue, this study proposes an improved model based on YOLOv7 for underwater object detection. We use YOLOv7 as the baseline model and optimize its key modules to overcome its limitations in underwater environments. Specifically, we introduce a vortex aggregation network module to disrupt noisy data, incorporating a spatial attention mechanism before this process to help the network better focus on important features and suppress irrelevant noise. To tackle the issue of potential information loss during downsampling, we propose the Space-To-Depth Pooling (STD-MP) module, which converts spatial features into depth features and combines them with max pooling for downsampling. Finally, we optimize the loss function. Experimental results show that our model achieves a 4.2% improvement in mAP compared to the baseline model.
[1] | Wang, C., Bochkovskiy, A. and Liao, H.M. (2023) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 7464-7475. https://doi.org/10.1109/cvpr52729.2023.00721 |
[2] | Lin, W., Zhong, J., Liu, S., Li, T. and Li, G. (2020) ROIMIX: Proposal-Fusion among Multiple Images for Underwater Object Detection. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 4-8 May 2020, 2588-2592. https://doi.org/10.1109/icassp40776.2020.9053829 |
[3] | Fan, B., Chen, W., Cong, Y. and Tian, J. (2020) Dual Refinement Underwater Object Detection Network. Computer Vision—ECCV 2020, Glasgow, 23-28 August 2020, 275-291. https://doi.org/10.1007/978-3-030-58565-5_17 |
[4] | Chen, L., Zhou, F., Wang, S., et al. (2020) SWIPENET: Object Detection in Noisy Underwater Images. arXiv: 2010.10006. https://doi.org/10.48550/arXiv.2010.10006 |
[5] | Chang, D. (2021) CDNet Is All You Need: Cascade DCN Based Underwater Object Detection RCNN. arXiv: 2111.12982. https://doi.org/10.48550/arXiv.2111.12982 |
[6] | Li, X., Li, F., Yu, J., et al. (2022) A High-Precision Underwater Object Detection Based on Joint Self-Supervised Deblurring and Improved Spatial Transformer Network. arXiv: 2203.04822. https://doi.org/10.48550/arXiv.2203.04822 |
[7] | Song, P., Li, P., Dai, L., Wang, T. and Chen, Z. (2023) Boosting R-CNN: Reweighting R-CNN Samples by RPN’s Error for Underwater Object Detection. Neurocomputing, 530, 150-164. https://doi.org/10.1016/j.neucom.2023.01.088 |
[8] | Jain, S. (2024) DeepSeaNet: Improving Underwater Object Detection Using EfficientDet. 2024 4th International Conference on Applied Artificial Intelligence (ICAPAI), Halden, 16 April 2024, 1-11. https://doi.org/10.1109/icapai61893.2024.10541265 |
[9] | Walia, J.S. and Seemakurthy, K. (2023) Optimized Custom Dataset for Efficient Detection of Underwater Trash. Towards Autonomous Robotic Systems, Cambridge, 13-15 September 2023, 292-303. https://doi.org/10.1007/978-3-031-43360-3_24 |
[10] | Dai, L., Liu, H., Song, P. and Liu, M. (2024) A Gated Cross-Domain Collaborative Network for Underwater Object Detection. Pattern Recognition, 149, Article 110222. https://doi.org/10.1016/j.patcog.2023.110222 |
[11] | Dai, L., Liu, H., Song, P., et al. (2023) Edge-Guided Representation Learning for Underwater Object Detection. arXiv: 2306.00440. https://doi.org/10.48550/arXiv.2306.00440 |
[12] | Zhou, J., He, Z., Lam, K., Wang, Y., Zhang, W., Guo, C., et al. (2024) AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 7659-7667. https://doi.org/10.1609/aaai.v38i7.28599 |
[13] | Fu, C., Fan, X., Xiao, J., Yuan, W., Liu, R. and Luo, Z. (2023) Learning Heavily-Degraded Prior for Underwater Object Detection. IEEE Transactions on Circuits and Systems for Video Technology, 33, 6887-6896. https://doi.org/10.1109/tcsvt.2023.3271644 |
[14] | Liu, Z., Wang, B., Li, Y., He, J. and Li, Y. (2024) UnitModule: A Lightweight Joint Image Enhancement Module for Underwater Object Detection. Pattern Recognition, 151, Article 110435. https://doi.org/10.1016/j.patcog.2024.110435 |
[15] | Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. https://doi.org/10.1109/tpami.2016.2577031 |
[16] | Sunkara, R. and Luo, T. (2023) No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. Machine Learning and Knowledge Discovery in Databases, Grenoble, 19-23 September 2022, 443-459. https://doi.org/10.1007/978-3-031-26409-2_27 |
[17] | Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R. and Ren, D. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. https://doi.org/10.1609/aaai.v34i07.6999 |
[18] | Tong, Z., Chen, Y., Xu, Z., et al. (2023) Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv: 2301.10051. https://doi.org/10.48550/arXiv.2301.10051 |
[19] | Hong, J., Fulton, M. and Sattar, J. (2020) Trashcan: A Semantically-Segmented Dataset towards Visual Detection of Marine Debris. arXiv: 2007.08097. https://doi.org/10.48550/arXiv.2007.08097 |
[20] | Japan Agency for Marine Earth Science and Technology (2018) Deep-Sea Debris Database. http://www.godac.jamstec.go.jp/catalog/dsdebris/e/index.html |
[21] | Li, C., Li, L., Jiang, H., et al. (2022) YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv: 2209.02976. https://doi.org/10.48550/arXiv.2209.02976 |
[22] | Wang, C., Yeh, I. and Mark Liao, H. (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Computer Vision—ECCV 2024, Milan, 29 September-4 October 2024, 1-21. https://doi.org/10.1007/978-3-031-72751-1_1 |
[23] | Wang, A., Chen, H., Liu, L., et al. (2024) YOLOv10: Real-Time End-to-End Object Detection. arXiv: 2405.14458. https://doi.org/10.48550/arXiv.2405.14458 |