Point cloud-based 3D object detection is a key technology in autonomous driving and mobile robot perception systems. However, the sparsity and irregularity of point cloud data result in poor performance of existing methods in detecting small objects at long distances and occluded objects. This paper proposes a Novel Pillar Feature Encoder to address feature encoding challenges in Pillar-based 3D point cloud object detection, improving the detection of occluded and small objects, especially at long distances. This method converts point cloud data into pillar features through voxelization and designs two convolutional neural network branches: Point Feature Encoding and Pillar Feature Encoding. The former extracts point features within local pillars, while the latter extracts global pillar features, which are then fused to resolve the problem of occlusion-related information loss, thus enhancing the detection accuracy of occluded objects. A Multi-attention mechanism is introduced to enhance the focus on key point features and learn optimal channel weights, thus improving the detection of small objects at long distances. We conducted experiments on the PointPillars network framework using the KITTI dataset for training and testing. The results show that the improved algorithm significantly enhances the average precision (AP) for 3D detection of Cars, Pedestrians, and Cyclists on the KITTI dataset, demonstrating exceptional performance in detecting occluded objects and small targets at long distances, thus validating the effectiveness of the proposed method.
References
[1]
Song, Z., Liu, L., Jia, F., Luo, Y., Jia, C., Zhang, G., et al. (2024) Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook. IEEE Transactions on Intelligent Transportation Systems, 25, 15407-15436. https://doi.org/10.1109/tits.2024.3439557
[2]
Wang, X., Mizukami, Y., Tada, M. and Matsuno, F. (2020) Navigation of a Mobile Robot in a Dynamic Environment Using a Point Cloud Map. Artificial Life and Robotics, 26, 10-20. https://doi.org/10.1007/s10015-020-00617-3
[3]
Charles, R.Q., Su, H., Kaichun, M. and Guibas, L.J. (2017) Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 77-85. https://doi.org/10.1109/cvpr.2017.16
[4]
Qi, C.R., Yi, L., Su, H. and Guibas, L.J. (2017) Pointnet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Advances in Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5099-5108.
[5]
Li, Y., Bu, R., Sun, M., Wu, W., Di, X. and Chen, B. (2018) PointCNN: Convolution on Χ-Transformed Points. Neural Information Processing Systems, Long Beach, 16-20 June 2019, 770-779.
[6]
Phan, A.V., Nguyen, M.L., Nguyen, Y.L.H. and Bui, L.T. (2018) DGCNN: A Convolutional Neural Network over Large-Scale Labeled Graphs. Neural Networks, 108, 533-543. https://doi.org/10.1016/j.neunet.2018.09.001
[7]
Zhou, Y. and Tuzel, O. (2018) Voxelnet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 4490-4499. https://doi.org/10.1109/cvpr.2018.00472
[8]
Yan, Y., Mao, Y. and Li, B. (2018) SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18, Article No. 3337. https://doi.org/10.3390/s18103337
[9]
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J. and Beijbom, O. (2019) Pointpillars: Fast Encoders for Object Detection from Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 12697-12705. https://doi.org/10.1109/cvpr.2019.01298
[10]
He, C., Zeng, H., Huang, J., Hua, X. and Zhang, L. (2020) Structure Aware Single-Stage 3D Object Detection from Point Cloud. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 11873-11882. https://doi.org/10.1109/cvpr42600.2020.01189
[11]
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., et al. (2020) PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 10529-10538. https://doi.org/10.1109/cvpr42600.2020.01054
[12]
Yang, Z., Sun, Y., Liu, S. and Jia, J. (2020) 3DSSD: Point-Based 3D Single Stage Object Detector. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 11040-11048. https://doi.org/10.1109/cvpr42600.2020.01105
[13]
Shi, G., Li, R. and Ma, C. (2022) Pillarnet: Real-Time and High-Performance Pillar-Based 3D Object Detection. Computer Vision—ECCV 2022 17th European Conference, Tel Aviv, 23-27 October 2022, 35-52. https://doi.org/10.1007/978-3-031-20080-9_3
[14]
Zhou, S., Tian, Z., Chu, X., Zhang, X., Zhang, B., Lu, X., et al. (2023) FastPillars: A Deployment-Friendly Pillar-Based 3D Detector. https://doi.org/10.48550/arXiv.2302.02367
[15]
Geiger, A., Lenz, P., Stiller, C. and Urtasun, R. (2013) Vision Meets Robotics: The KITTI Dataset. The International Journal of Robotics Research, 32, 1231-1237. https://doi.org/10.1177/0278364913491297
[16]
Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. Computer Vision—ECCV 2018 15th European Conference, Munich, 8-14 September 2018, 3-19. https://doi.org/10.1007/978-3-030-01234-2_1
[17]
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W. and Hu, Q. (2020) Eca-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 14-19 June 2020, 11534-11542. https://doi.org/10.1109/cvpr42600.2020.01155