|
轻量化深度学习模型在行人检测上的应用
|
Abstract:
行人检测和跟踪在目标跟踪领域至关重要,广泛应用于辅助驾驶、安全监测和其他行人分析。在多目标跟踪中,面临多种挑战,因此需要设计实时性和高精度的算法。本研究提出了一种新的行人跟踪模型。在行人特征建模阶段,采用Yolov4-tiny网络模型和COCO数据集预训练权重参数,经过迁移学习到MOT数据集。为了解决目标微小部分的变形和遮挡问题,引入了一种深度分类跟踪器,结合了MeanShift滤波器和卡尔曼滤波器。通过反投影图像和物体轮廓与卡尔曼线性观测模型相融合,实现了目标预测。实验结果表明,该模型能够在复杂环境中长时间跟踪目标,具有良好的跟踪效果,多目标跟踪精度为57.6%,目标定位精度为82.1%。
Pedestrian detection and tracking are crucial in the field of target tracking and are widely used in assisted driving, safety monitoring, and other pedestrian analysis. In multi-target tracking, various challenges are faced; therefore, it is necessary to design real-time and high-precision algorithms. This study proposes a new pedestrian tracking model. In the pedestrian feature modeling stage, the Yolov4 tiny network model and COCO dataset were used to pretrain weight parameters, which were then transferred and learned to the MOT dataset. In order to solve the problem of deformation and occlusion of small parts of the target, a deep classification tracker is introduced, which combines the MeanShift filter and the Kalman filter. By integrating back projection images and object contours with Kalman linear observation models, target prediction has been achieved. The experimental results show that the model can track targets for a long time in complex environments and has good tracking performance. The multi-target tracking accuracy is 57.6%, and the target positioning accuracy is 82.1%.
[1] | Girshick, R., Donahue, J., Darrell, T., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Co-lumbus, 23-28 June 2014, 580-587. https://doi.org/10.1109/CVPR.2014.81 |
[2] | He, K., Zhang, X., Ren, S., et al. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916.
https://doi.org/10.1109/TPAMI.2015.2389824 |
[3] | Wang, X., Shrivastava, A. and Gupta, A. (2017) A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2606-2615. https://doi.org/10.1109/CVPR.2017.324 |
[4] | Redmon, J., Divvala, S. and Girshick, R, et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/CVPR.2016.91 |
[5] | Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 7263-7271.
https://doi.org/10.1109/CVPR.2017.690 |
[6] | Zhao, L. and Li, S. (2020) Object Detection Algorithm Based on Improved YOLOv3. Electronics, 9, 537.
https://doi.org/10.3390/electronics9030537 |
[7] | Hu, X., Liu, Y., Zhao, Z., et al. (2021) Real-Time Detection of Uneaten Feed Pellets in Underwater Images for Aquaculture Using an Improved YOLO-V4 Network. Computers and Electronics in Agriculture, 185, 106135.
https://doi.org/10.1016/j.compag.2021.106135 |
[8] | Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot Multibox Detector. Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 21-37.
https://doi.org/10.1007/978-3-319-46448-0_2 |
[9] | Hare, S., Golodetz, S., Saffari, A., et al. (2015) Struck: Structured Output Tracking with Kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2096-2109. https://doi.org/10.1109/TPAMI.2015.2509974 |
[10] | Danelljan, M., Hager, G. and Shahbaz Khan, F., et al. (2015) Learning Spatially Regularized Correlation Filters for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 4310-4318. https://doi.org/10.1109/ICCV.2015.490 |
[11] | Nam, H. and Han, B. (2016) Learning Multi-Domain Convolu-tional Neural Networks for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 4293-4302.
https://doi.org/10.1109/CVPR.2016.465 |
[12] | Cohen, A.J., Brauer, M., Burnett, R., et al. (2017) Estimates and 25-Year Trends of the Global Burden of Disease Attributable to Ambient Air Pollut ion: An Analysis of Data from the Global Burden of Diseases Study 2015. The Lancet, 389, 1907-1918. https://doi.org/10.1016/S0140-6736(17)30505-6 |
[13] | Danelljan, M., Robinson, A., Shahbaz Khan, F., et al. (2016) Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. Computer Vi-sion-ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 472-488. https://doi.org/10.1007/978-3-319-46454-1_29 |
[14] | Xiao, F., Liu, B. and Li, R. (2020) Pedestrian Object Detection with Fusion of Visual Attention Mechanism and Semantic Computation. Multimedia Tools and Applica-tions, 79, 14593-14607.
https://doi.org/10.1007/s11042-018-7143-6 |
[15] | Haq, E.U., Jianjun, H., Li, K., et al. (2020) Human Detection and Tracking with Deep Convolutional Neural Networks under the Constrained of Noise and Occluded Scenes. Multimedia Tools and Applications, 79, 30685-30708.
https://doi.org/10.1007/s11042-020-09579-x |
[16] | Fukunaga, K. and Hostetler, L. (1975) the Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition. IEEE Transactions on Information Theory, 21, 32-40. https://doi.org/10.1109/TIT.1975.1055330 |
[17] | Milan, A., Leal-Taixe, L., Reid, I., Roth, S. and Schindler, K. (2016) Mot16: A Benchmark for Multi-Object Tracking. arXiv: 1603.00831. |
[18] | Lin, T.,Y., Maire, M., Belongie, S., et al. (2014) Microsoft Coco: Common Objects in Context. Computer Vision-ECCV 2014: 13th European Conference, Zurich, 6-12 September, 740-755.
https://doi.org/10.1007/978-3-319-10602-1_48 |
[19] | Dicle, C., Camps, O.I. and Sznaier, M. (2013) The Way They Move: Tracking Multiple Targets with Similar Appearance. Proceedings of the IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 2304-2311.
https://doi.org/10.1109/ICCV.2013.286 |
[20] | Geiger, A., Lauer, M., Wojek, C., et al. (2013) 3d Traffic Scene Understanding from Movable Platforms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1012-1025. https://doi.org/10.1109/TPAMI.2013.185 |
[21] | Xu, J., Cao, Y., et al. (2019) Spatial-Temporal Relation Networks for Multi-Object Tracking. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 3988-3998.
https://doi.org/10.1109/ICCV.2019.00409 |