|
基于改进目标检测算法的道路场景图像实例分割方法
|
Abstract:
实例分割是图像分割的重要组成部分,同时也是计算机视觉中的一个关键研究课题,广泛应用于自动驾驶和安全监控等领域。然而,由于道路场景通常具有复杂性、多样性和杂乱的特点,处理这些场景变得尤为挑战。针对道路场景图像实例分割难度大、精度低、定位不精确等问题,本文提出一种基于改进YOLOv5 (You Only Look Once version 5)的道路场景实例分割算法。以YOLOv5为基础模型,在Head模块中采用RFAConv (Receptive-Field Attention Convolution)卷积代替部分传统卷积,它全面解决了卷积核的参数共享问题,考虑到接受域中每个特征的重要性,提供了几乎可以忽略不计的计算成本和参数增量,能够更好地捕捉和融合图像特征,提升分割的精度和鲁棒性。采用ShapeIOU代替YOLOv5中原损失函数CIOU (Complete-IoU),通过聚焦边框自身形状与自身尺度计算损失,使得边框回归更为精确,能够有效提升检测效果且优于现有方法。实验结果表明:与原模型相比,改进后的模型的分割精度mAP50 (mean Average Precision)达到了33.8%,相较于YOLOv5s,优化后的模型在分割精度上提高了1.2%,能够更加高效地完成道路场景的图像分割任务。
Instance segmentation is a crucial component of image segmentation and serves as a significant research area within computer vision. It finds extensive applications in various domains, including autonomous driving and security surveillance. However, due to the complexity, diversity, and cluttered nature of road scenes, handling these scenarios becomes particularly challenging. In response to the challenges of high complexity, low accuracy, and imprecise positioning in road scene image instance segmentation, this paper introduces an enhanced YOLOv5-based (You Only Look Once version 5) algorithm specifically designed for road scene instance segmentation. Taking YOLOv5 as the base model, RFAConv (Receptive-Field Attention Convolution) convolution is used in the Head module instead of part of the traditional convolution, which comprehensively solves the parameter sharing problem of convolution kernel, considers the importance of each feature in the receptive domain, provides almost negligible computational cost and parameter increment, and is able to better capture and fuse the image features to improve the segmentation accuracy and robustness. ShapeIOU is used instead of the original loss function CIOU (Complete-IoU) in YOLOv5, and the loss is calculated by focusing on the shape of the frame itself and the scale of the frame itself, which makes the frame regression more accurate, and it can effectively improve the detection effect and outperform the existing methods. The experimental results show that compared with the original model, the segmentation accuracy mAP50 of the improved model reaches 33.8%, and compared with YOLOv5s, the optimised model improves the segmentation accuracy by 1.2%, which is able to complete the image segmentation task of the road scene more efficiently.
[1] | 林银辉. 基于卷积神经网络的道路场景分割方法研究[D]: [硕士学位论文]. 北京: 中国地质大学, 2019. |
[2] | 何淼楹, 崔宇超. 面向自动驾驶的交通场景语义分割[J]. 计算机应用, 2021, 41(S1): 25-30. |
[3] | 车子轩. 基于深度学习的城市街景图像分割方法研究[D]: [硕士学位论文]. 哈尔滨: 哈尔滨理工大学, 2023. |
[4] | 郭莉莉. 基于GAN的道路场景图像语义分割算法研究[D]: [硕士学位论文]. 绵阳: 西南科技大学, 2023. |
[5] | Chang, J., Fan, K. and Chang, Y. (2002) Multi-Modal Gray-Level Histogram Modeling and Decomposition. Image and Vision Computing, 20, 203-216. https://doi.org/10.1016/s0262-8856(01)00095-6 |
[6] | Dong, L., Dong, W., Feng, N., Mao, M., Chen, L. and Kong, G. (2017) Color Space Quantization-Based Clustering for Image Retrieval. Frontiers of Computer Science, 11, 1023-1035. https://doi.org/10.1007/s11704-016-5538-y |
[7] | Xie, X., Wu, J. and Jing, M. (2013) Fast Two-Stage Segmentation via Non-Local Active Contours in Multiscale Texture Feature Space. Pattern Recognition Letters, 34, 1230-1239. https://doi.org/10.1016/j.patrec.2013.04.016 |
[8] | Pal, S., Chatterjee, S., Dey, D. and Munshi, S. (2018) Morphological Operations with Iterative Rotation of Structuring Elements for Segmentation of Retinal Vessel Structures. Multidimensional Systems and Signal Processing, 30, 373-389. https://doi.org/10.1007/s11045-018-0561-9 |
[9] | Bolya, D., Zhou, C., Xiao, F. and Lee, Y.J. (2019) YOLACT: Real-Time Instance Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9156-9165. https://doi.org/10.1109/iccv.2019.00925 |
[10] | Wang, X., Kong, T., Shen, C., Jiang, Y. and Li, L. (2020) SOLO: Segmenting Objects by Locations. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, 649-665. https://doi.org/10.1007/978-3-030-58523-5_38 |
[11] | Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/cvpr.2016.91 |
[12] | He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2017) Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2980-2988. https://doi.org/10.1109/iccv.2017.322 |
[13] | Zhang, X., Liu, C., Yang, D., et al. (2023) RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. arXiv: 2304.03198. |
[14] | Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. https://doi.org/10.1109/cvpr.2017.106 |
[15] | Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768. https://doi.org/10.1109/cvpr.2018.00913 |
[16] | Zhang, H. and Zhang, S. (2023) Shape-IOU: More Accurate Metric Considering Bounding Box Shape and Scale. arXiv: 2312.17663. |
[17] | Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., et al. (2022) Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Transactions on Cybernetics, 52, 8574-8586. https://doi.org/10.1109/tcyb.2021.3095305 |
[18] | 何佳琦. 基于改进的轻量化SOLOv2鱼类图像实例分割方法研究[D]: [硕士学位论文]. 大连: 大连海洋大学, 2023. |
[19] | Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/iccv.2015.169 |
[20] | 徐博文. 基于深度学习的城市道路场景实例分割方法研究[D]: [硕士学位论文]. 长春: 吉林大学, 2022. |
[21] | Wang, X., Zhang, R., Shen, C., Kong, T. and Li, L. (2021) SOLO: A Simple Framework for Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 8587-8601. https://doi.org/10.1109/tpami.2021.3111116 |
[22] | Hurtik, P., Molek, V., Hula, J., Vajgl, M., Vlasanek, P. and Nejezchleba, T. (2022) Poly-Yolo: Higher Speed, More Precise Detection and Instance Segmentation for Yolov3. Neural Computing and Applications, 34, 8275-8290. https://doi.org/10.1007/s00521-021-05978-9 |
[23] | 赵南南, 高翡晨. 基于改进YOLOv8的交通场景实例分割算法[J/OL]. 计算机工程: 1-12. https://doi.org/10.19678/j.issn.1000-3428.0068677, 2024-07-13. |
[24] | 刘想德, 马昊. 基于YOLOv5的零件识别轻量化算法[J]. 组合机床与自动化加工技术, 2024(5): 100-104, 107. |
[25] | Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2020) Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv: 2004.10934. |
[26] | 于傲泽, 夏智权, 魏维伟, 等. 基于改进YOLOv5的空中目标部位快速分割算法[J]. 制导与引信, 2023, 44(4): 48-55. |
[27] | 马冬梅, 郭智浩, 罗晓芸. 改进YOLOv5s-Seg的高效实时实例分割模型[J]. 计算机工程与应用, 2024, 60(16): 258-268. |
[28] | 常夏宁. 基于卷积神经网络的道路场景分割算法研究[D]: [硕士学位论文]. 郑州: 河南工业大学, 2021. |
[29] | 高敏, 邹阳林, 曹新旺. 基于改进YOLOv5模型的织物疵点检测[J]. 现代纺织技术, 2023, 31(4): 155-163. |