|
基于改进YOLOv3的红外影像目标识别算法研究
|
Abstract:
针对于夜间自动驾驶目标检测行人和车辆目标准确率低的问题,本文提出一种基于改进YOLOv3的红外影像目标识别算法。首先,该算法在原有残差单元基础上进行了改进,同时增加backbone中大尺寸图像的卷积次数,提高特征提取能力,并将后续常规卷积更换为深度可分离卷积,降低模型参数量,提高运行速度;其次,将其多尺度特征融合中特征融合结构更换为Panet结构,提高底层信息的利用率;最后,采用Distance-IoU (DIoU)作为archor损失函数,加快模型收敛。在Flir影像数据集上的测试结果表明,所提改进的YOLOv3红外识别算法改进的模型在模型大小几乎不变的情况下在准确率和召回率上获得较好的检测精度,相比于YOLOv3在行人和汽车两类上分别有2.94%和3.12%提升,平均AP也有3.03%的提升。实验证明,本方法改进后在提高检测精度的同时,还减少了模型量,提高了检测速度。
Aiming at the problem of low accuracy of pedestrian and vehicle target detection in automatic driving at night, this paper proposes an infrared image target recognition algorithm based on improved YOLOv3. First, the algorithm improves the feature extraction ability by increasing the number of convolutions of large-size images in the backbone, and replaces subsequent conventional convolutions with depthwise separable convolutions to reduce the amount of model parameters and improve the running speed; In the scale feature fusion, the feature fusion structure is replaced by the Panet structure to improve the utilization of the underlying information; finally, Distance-IoU (DIoU) is used as the archor loss function to speed up the model convergence. The test results on the Flir image data set show that the improved model of the proposed improved YOLOv3 infrared recognition algorithm achieves better detection accuracy in terms of precision and recall when the model size is almost unchanged. Compared with YOLOv3, there are 2.94% and 3.12% increases in pedestrians and cars, respectively, and the average AP also increases by 3.03%. Experiments show that the improved method not only improves the detection accuracy, but also reduces the amount of models and improves the detection speed.
[1] | Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, 20-25 June 2005, 886-893. |
[2] | Ojala, T., Pietik?inen, M. and Harwood, D. (1996) A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recognition, 29, 51-59. https://doi.org/10.1016/0031-3203(95)00067-4 |
[3] | Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 |
[4] | Wren, C.R., Azarbayejani, A.J., Darrell, T.J., et al. (1996) Pfinder: Real-Time Tracking of the Human Body. Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, 14-16 October 1996, 51-59. |
[5] | Yin, J., Lei, L., He, L., et al. (2016) The Infrared Moving Object Detection and Security Detection Related Algorithms Based on W4 and Frame Difference. Infrared Physics & Technology, 77, 302-315.
https://doi.org/10.1016/j.infrared.2016.06.004 |
[6] | Horn, B.K.P. and Schunck, B.G. (1981) Determining Optical Flow. Artificial Intelligence, 17, 185-203.
https://doi.org/10.1016/0004-3702(81)90024-2 |
[7] | Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 580-587. https://doi.org/10.1109/CVPR.2014.81 |
[8] | He, K., Zhang, X., Ren, S., et al. (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, 37, 1904-1916.
https://doi.org/10.1109/TPAMI.2015.2389824 |
[9] | Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/ICCV.2015.169 |
[10] | Ren, S., He, K., Girshick, R., et al. (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. |
[11] | Dai, J., Li, Y., He, K., et al. (2016) R-FCN: Object Detection via Region-Based Fully Convolutional Networks. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, 379. |
[12] | Sermanet, P., Eigen, D., Zhang, X., et al. (2013) OverFeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. arXiv:1312.6229. |
[13] | Redmon, J., Divvala, S., Girshick, R., et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788.
https://doi.org/10.1109/CVPR.2016.91 |
[14] | Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. https://doi.org/10.1109/CVPR.2017.690 |
[15] | Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement. arXiv: 1804.02767.. |
[16] | Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot MultiBox Detector. Springer, Cham. |
[17] | Lin, T.Y., Goyal, P., Girshick, R., et al. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2999-3007. https://doi.org/10.1109/ICCV.2017.324 |
[18] | Lin, T.Y., Dollar, P., Girshick, R., et al. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944.
https://doi.org/10.1109/CVPR.2017.106 |
[19] | Liu, S., Qi, L., Qin, H., et al. (2018) Path Aggregation Network for Instance Segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8759-8768.
https://doi.org/10.1109/CVPR.2018.00913 |
[20] | Howard, A.G., Zhu, M., Chen, B., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861. |
[21] | Zheng, Z., Wang, P., Ren, D., et al. (2020) Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Transactions on Cybernetics.
https://doi.org/10.1109/TCYB.2021.3095305 |
[22] | Zheng, Z., Wang, P., Liu, W., et al. (2020) Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12993-13000. |