|
基于注意力机制的改进PointPillars三维目标检测
|
Abstract:
针对传统三维点云目标检测算法对小目标检测精度低的弱点,提出一种基于空间注意力机制的改进PointPillars方法。首先,在pillar特征网络中增加点云特征表示来丰富特征编码,提高每个点的表征能力,其次,在伪图像上通过空间注意力机制重新计算编码后空间点的特征权重,增强算法特征提取能力,提高检测性能,最后,利用公开数据集KITTI对改进算法进行验证。实验结果表明,该方法能够准确地检测出小尺寸行人和骑行者目标,同时在大尺寸汽车目标检测上保持稳定性能。此外,在中等检测难度条件下,三维模式、鸟瞰图模式和平均方向相似度模式三个类别平均精度均值(mAP)分别达到了62.07%、68.85%和70.02%,较改进前算法均有较大提升。
Aiming at the weaknesses of traditional 3D point cloud object detection algorithms with low detection accuracy for small objects, an improved PointPillars method based on spatial attention mechanism is proposed. Firstly, the point cloud feature representation is added to the pillar feature network to enrich the feature encoding and improve the representation ability of each point, secondly, the feature weights of the encoded spatial points are recalculated on the pseudo-image by the spatial attention mechanism, which enhances the algorithm’s feature extraction ability and improves the detection performance, and lastly, the improved algorithm is validated by using the publicly available dataset KITTI. The experimental results show that the method is able to accurately detect small-size pedestrian and cyclist object, while maintaining stable performance on large-size car object detection. In addition, the mean average precision (mAP) of the three categories of 3D mode, bird’s-eye view mode, and average orientation similarity mode reached 62.07%, 68.85%, and 70.02%, respectively, under the medium detection difficulty condition, which are all greatly improved over the pre-improvement algorithm.
[1] | Alaba, S.Y. and Ball, J.E. (2022) A Survey on Deep-Learning-Based Lidar 3d Object Detection for Autonom, Ous Driving. Sensors, 22, Article 9577. https://doi.org/10.3390/s22249577 |
[2] | Qi, C.R., Su, H., Mo, K., et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Computer Vision and Pattern Recognition. https://arxiv.org/abs/1612.00593 |
[3] | Zhou, Y. and Tuzel, O. (2017) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4490-4499.
https://doi.org/10.1109/CVPR.2018.00472 |
[4] | Yan, Y., Mao, Y. and Li, B. (2018) Second: Sparsely Embedded Convolutional Detection. Sensors, 18, Article 3337.
https://doi.org/10.3390/s18103337 |
[5] | Lang, A.H., Vora, S., Caesar, H., et al. (2018) PointPillars: Fast Encoders for Object Detection from Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12689-12697. https://doi.org/10.1109/CVPR.2019.01298 |
[6] | 陈德江, 余文俊, 高永彬. 基于改进PointPillars的激光雷达三维目标检测[J]. 激光与光电子学进展, 2023, 60(10): 447-453. |
[7] | Liu, W., Anguelov, D., Erhan, D., et al. (2016) SSD: Single Shot Multibox Detector. Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, 11-14 October 2016, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[8] | Everingham, M.R., Eslami, S.M.A., Gool, L.J., et al. (2015) The Pascal Visual Object Classes Challenge. International Journal of Computer Vision, 111, 98-136. https://doi.org/10.1007/s11263-014-0733-5 |
[9] | 詹为钦, 倪蓉蓉, 杨彪. 基于注意力机制的PointPillars+三维目标检测[J]. 江苏大学学报(自然科学版), 2020, 41(3): 268-273. |
[10] | Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141.
https://doi.org/10.1109/CVPR.2018.00745 |
[11] | Wang, Q., Wu, B., Zhu, P., et al. (2020) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 11534-11542. https://doi.org/10.1109/CVPR42600.2020.01155 |
[12] | Lin, T.Y., Dollár, P., Girshick, R., et al. (2017) Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 2117-2125.
https://doi.org/10.1109/CVPR.2017.106 |
[13] | Wang, W., Xie, E., Song, X., et al. (2019) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October 2019-2 November 2019, 8440-8449. https://doi.org/10.1109/ICCV.2019.00853 |
[14] | Woo, S., Park, J., Lee, J.Y., et al. (2018) Cbam: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision-ECCV 2018, Lecture Notes in Computer Science, Vol. 11211, Springer, Cham, 3-19. https://doi.org/10.1007/978-3-030-01234-2_1 |
[15] | Tao, Z. and Su, J. (2022) Research on Object Detection Algorithm of 3D Point Cloud PointPillar Based on Attention Mechanism. 2022 China Automation Congress (CAC), Xiamen, 25-27 November 2022, 4382-4385.
https://doi.org/10.1109/CAC57257.2022.10055052 |
[16] | Geiger, A., Lenz, P. and Urtasun, R. (2012) Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 3354-3361
https://doi.org/10.1109/CVPR.2012.6248074 |