|
改进YOLOv10n的无人机航拍图像检测算法
|
Abstract:
无人机航拍图像目标检测在民用和军事领域具有重要的应用价值。针对无人机航拍图像中目标小、尺度变化大和背景干扰等因素导致检测精度低、定位不准确的问题,提出一种改进YOLOv10n的无人机航拍图像目标检测算法。首先将C2f模块进行改进,利用递归门控卷积(gnConv)与c2f融合二次创新得到C2f-GConv模块,以适应航拍图像中物体的形变和尺度变化。同时将骨干网络替换成Efficientformerv2,使得EfficientFormerV2在保持类似MobileNetV2大小和速度的同时,比MobileNetV2高约4%的top-1精度,明显提高了模型的效率和性能。在VisDrone2019数据集上进行对比实验和消融实验,mAP50值较基线模型提升了3.2%,检测速度FPS达到90帧/s,能够满足实时性的检测需求。与主流算法进行对比实验,所提算法表现优于目前主流算法。
Aerial target detection in unmanned aerial vehicle (UAV) imagery holds significant application value in both civilian and military fields. To address the challenges of low detection accuracy and imprecise localization caused by small targets, large scale variations, and background interference in UAV imagery, an improved YOLOv10n algorithm for aerial image target detection is proposed. Firstly, the C2f module is enhanced by integrating the recursive gated convolution (gnConv) with the C2f for a second innovation, resulting in C2f-GConv adapting to the deformation and scale changes of objects in aerial images. Meanwhile, the backbone network is replaced with EfficientFormerV2, which maintains size and speed similar to MobileNetV2 but achieves about 4% higher top-1 accuracy than MobileNetV2, significantly improving the model’s efficiency and performance. Comparative and ablation experiments are conducted on the VisDrone2019 dataset, with the mAP50 value increasing by 3.2% over the baseline model and a detection speed of FPS reaching 90 frames per second, meeting the real-time detection requirements. Comparative experiments with mainstream algorithms show that the proposed algorithm outperforms current mainstream algorithms.
[1] | 欧阳权, 张怡, 马延, 等. 基于深度学习的无人机航拍目标检测与跟踪方法综述[J]. 电光与控制, 2024, 31(3): 1-7. |
[2] | Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169 |
[3] | He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2961-2969. https://doi.org/10.1109/iccv.2017.322 |
[4] | Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. 14th European Conference, Computer Vision—ECCV 2016, Amsterdam, 11-14 October 2016, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[5] | Wang, C., Bochkovskiy, A. and Liao, H.M. (2023) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 7464-7475. https://doi.org/10.1109/cvpr52729.2023.00721 |
[6] | 张徐, 朱正为, 郭玉英, 等. 基于cosSTR-YOLOv7的多尺度遥感小目标检测[J]. 电光与控制, 2024, 31(4): 28-34. |
[7] | 谭亮, 赵良军, 郑莉萍, 等. 基于YOLOv5s-AntiUAV的反无人机目标检测算法研究[J]. 电光与控制, 2024, 31(5): 40-45+107. |
[8] | 陈卫彪, 贾小军, 朱响斌, 等. 基于DSM-YOLOv5的无人机航拍图像目标检测[J]. 计算机工程与应用, 2023, 59(18): 226-233. |
[9] | 罗会兰, 陈鸿坤. 基于深度学习的目标检测研究综述[J]. 电子学报, 2020, 48(6): 1230-1239. |
[10] | 梁刚, 赵良军, 宁峰, 等. 基于可变形卷积与注意力的无人机航拍车辆目标检测算法[J]. 现代电子技术, 2024, 47(23): 138-146. |
[11] | Li, Y., Yuan, G., Wen, Y., et al. (2022) EfficientFormer: Vision Transformers at MobileNet Speed. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 28 November-9 December 2022, 12934-12949. |
[12] | Maas, A.L., Hannun, A.Y. and Ng, A.Y. (2013) Rectifier Nonlinearities Improve Neural Network Acoustic Models. The 30th International Conference on Machine Learning (ICML 2013), Atlanta, 16-21 June 2013, 6-12. |
[13] | Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 2117-2125. https://doi.org/10.1109/cvpr.2017.106 |