|
基于OSTD-YOLO复杂路况下的障碍物检测研究
|
Abstract:
随着自动驾驶技术的不断推进,在面对含有小目标、遮挡目标以及目标尺寸变化较大的复杂路况时,障碍物检测中的漏检与误检问题成为亟待攻克的关键难题。为此,设计了名为OSTD–YOLO (Occlusion and Small Obstacle Detection)的算法。该算法以YOLO11n作为基础模型展开构建,主要有两方面的改进。一是构建了增强小目标特征金字塔模块EMTFP (Enhanced Micro-Target Feature Pyramid),此模块依托PAFPN架构,引入SPDConv对P2特征层进行精细化处理,有效提升了小目标特征提取能力。同时,受CSP思想和OmniKernel启发,研发出CSP-OmniKernel模块,用于整合不同层级提取的各类特征,让特征信息更加精炼。二是在模型输出端引入DyHead注意力机制检测头,实现了对空间、尺度和任务维度的自适应特征增强,全面提升了模型在复杂路况下对障碍物的感知精度。在公开数据集BDD 100k上进行测试后发现,相较于原始的YOLO11n模型,OSTD–YOLO算法的mAP50提升了近5个百分点,参数量为3.58 M,在与其他同类方法的对比试验中展现出参数量与精度的最佳平衡性。一系列试验充分表明,OSTD-YOLO算法能够出色地应对复杂路况下的障碍物检测任务,为自动驾驶技术的落地应用提供了有力支撑。
With the continuous advancement of autonomous driving technology, in the face of complex road conditions with small targets, occluded targets and large changes in target size, the problem of missed detection and false detection in obstacle detection has become a key problem to be solved urgently. To this end, an algorithm named OSTD-YOLO (Occlusion and Small Obstacle Detection) is designed. The algorithm is built with YOLO11n as the basic model, and there are two main improvements. One is to build an Enhanced Micro-Target Feature Pyramid module EMTFP (Enhanced Micro-Target Feature Pyramid). This module relies on the PAFPN architecture and introduces SPDConv to refine the P2 feature layer, which effectively improves the feature extraction ability of small targets. At the same time, inspired by the CSP idea and OmniKernel, the CSP-OmniKernel module is developed to integrate various features extracted at different levels to make the feature information more refined. Second, the DyHead attention mechanism detection head is introduced at the output of the model, which realizes the adaptive feature enhancement of space, scale and task dimensions, and comprehensively improves the model’s perception accuracy of obstacles under complex road conditions. After testing on the public dataset BDD 100 k, it is found that compared with the original YOLO11n model, the mAP50 of the OSTD-YOLO algorithm is improved by nearly 5 percentage points, and the parameter amount is 3.58 M. It shows the best balance of parameter amount and accuracy in the comparison test with other similar methods. A series of experiments fully demonstrate that the OSTD-YOLO algorithm can effectively handle obstacle detection tasks under complex road conditions, providing strong support for the practical application of autonomous driving technology.
[1] | 赵洋, 王潇, 蔡柠泽, 等. 自动驾驶目标检测不确定性估计方法综述[J]. 汽车工程学报, 2024, 14(5): 760-771. |
[2] | Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. https://doi.org/10.1109/cvpr.2014.81 |
[3] | Ren, S., He, K., Girshick, R. and Sun, J. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. https://doi.org/10.1109/tpami.2016.2577031 |
[4] | 李晓晖, 夏芹, 张强. 基于路侧视觉感知的交通目标检测及跟踪方法研究[J/OL]. 汽车工程学报, 2024: 1-10. http://kns.cnki.net/kcms/detail/50.1206.U.20240707.1822.002.html, 2025-04-29. |
[5] | Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/cvpr.2016.91 |
[6] | Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[7] | Lin, T., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. https://doi.org/10.1109/iccv.2017.324 |
[8] | 薛雅丽, 贺怡铭, 崔闪, 等. 基于改进YOLOv5的SAR图像有向舰船目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(2): 261-268. |
[9] | 王燕妮, 张婧菲. 改进YOLOv8的无人机小目标检测算法[J/OL]. 探测与控制学报, 2024: 1-10. http://kns.cnki.net/kcms/detail/61.1316.TJ.20241212.1003.005.html, 2025-04-29. |
[10] | 余荣威, 张逸轩, 曹书明, 等. 基于改进YOLOv8模型的交通标志检测方法[J/OL]. 武汉大学学报(理学版), 2024: 1-10. https://doi.org/10.14188/j.1671-8836.2024.0077, 2025-04-29. |
[11] | 汤伟博, 方强, 李沛根, 等. 基于RSD-YOLO的无人机航拍图像的小目标检测[J/OL]. 计算机工程, 2024: 1-15. https://doi.org/10.19678/j.issn.1000-3428.0070151, 2025-04-29. |
[12] | Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768. https://doi.org/10.1109/cvpr.2018.00913 |
[13] | Sunkara, R. and Luo, T. (2023) No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P. and Tsoumakas, G., Eds., Machine Learning and Knowledge Discovery in Databases, Springer, 443-459. https://doi.org/10.1007/978-3-031-26409-2_27 |
[14] | Dai, X., Chen, Y., Xiao, B., Chen, D., Liu, M., Yuan, L., et al. (2021) Dynamic Head: Unifying Object Detection Heads with Attentions. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 7369-7378. https://doi.org/10.1109/cvpr46437.2021.00729 |
[15] | Khanam, R. and Hussain, M. (2024) YOLOv11: An Overview of the Key Architectural Enhancements. arXiv: 2410.17725. |
[16] | Varghese, R. and Sambath, M. (2024) YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, 18-19 April 2024, 1-6. https://doi.org/10.1109/adics58448.2024.10533619 |
[17] | Wang, C., Mark Liao, H., Wu, Y., Chen, P., Hsieh, J. and Yeh, I. (2020) CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, 14-19 June 2020, 1571-1580. https://doi.org/10.1109/cvprw50498.2020.00203 |
[18] | Cui, Y., Ren, W. and Knoll, A. (2024) Omni-Kernel Network for Image Restoration. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 1426-1434. https://doi.org/10.1609/aaai.v38i2.27907 |
[19] | Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., et al. (2020) BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2633-2642. https://doi.org/10.1109/cvpr42600.2020.00271 |