全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于深度学习的3D目标检测技术综述
Survey on 3D Object Detection Based on Deep Learning

DOI: 10.12677/jisp.2025.142017, PP. 173-184

Keywords: 目标检测,3D车辆检测,深度学习,计算机视觉
Object Detections
, 3D Vehicle Detection, Deep Learning, Computer Vision

Full-Text   Cite this paper   Add to My Lib

Abstract:

基于深度学习的3D目标检测技术在自动驾驶、机器人导航等众多前沿领域意义重大。然而,当前该技术仍面临诸多挑战,如处理大规模3D数据时计算复杂度高、小目标检测困难等,这些问题严重制约了其进一步发展与广泛应用。为突破困境,文章详细介绍了KITTI、NuScenes等常用数据集,并对基于图像、点云及多传感器融合的3D目标检测方法进行了分类分析。基于图像的方法受限于深度信息不足,检测精度较低;基于点云的方法借助深度信息,精度优势明显;多传感器融合方法则展现出更强的检测性能;而基于Transformer和图神经网络(GNN)的方法通过全局上下文建模与空间关系推理推动技术突破。通过对主流模型在KITTI和NuScenes数据集上的性能评估与对比,分析了不同模型在检测精度及复杂场景适应性上的表现差异。结论表明,未来可进一步探索轻量级网络架构、多模态动态融合策略及基于物理感知的小目标增强技术,结合Transformer全局建模与GNN关系推理,推动3D目标检测在实时性、复杂场景适应性及小目标检测精度上的突破。
3D object detection based on deep learning holds significant importance in cutting-edge fields such as autonomous driving and robotic navigation. However, the technology still faces multiple challenges, including high computational complexity when processing large-scale 3D data and difficulties in small object detection, which severely restrict its further development and widespread application. To address these limitations, this article provides a detailed introduction to commonly used datasets like KITTI and NuScenes, along with a categorized analysis of 3D object detection methods based on images, point clouds, and multi-sensor fusion. Image-based methods suffer from limited depth information and lower detection accuracy, while point cloud-based approaches demonstrate clear precision advantages by leveraging depth data. Multi-sensor fusion methods exhibit superior detection performance, whereas Transformer-based and Graph Neural Network (GNN) approaches drive technological breakthroughs through global context modeling and spatial relationship reasoning. Through performance evaluation and comparison of mainstream models on KITTI and NuScenes datasets, the study analyzes differences in detection accuracy and adaptability to complex scenarios. The conclusion suggests that future research could focus on exploring lightweight network architectures, dynamic multi-modal fusion strategies, and physics-aware enhancement techniques for small objects. By combining Transformer’s global modeling with GNN’s relational reasoning, breakthroughs may be achieved in real-time performance, complex scenario adaptability, and small object detection accuracy for 3D object detection.

References

[1]  Li, Z., Du, Y., Zhu, M., Zhou, S. and Zhang, L. (2021) A Survey of 3D Object Detection Algorithms for Intelligent Vehicles Development. Artificial Life and Robotics, 27, 115-122.
https://doi.org/10.1007/s10015-021-00711-0
[2]  振兴发展靠人才(一)——《关于贯彻〈国家创新驱动发展战略纲要〉建设科技强省的实施意见》解读[J]. 共产党员, 2017(18): 46-47.
[3]  陈辉东, 丁小燕, 刘艳霞. 基于深度学习的目标检测算法综述[J]. 北京联合大学学报, 2021, 35(3): 39-46.
[4]  谢富, 朱定局. 深度学习目标检测方法综述[J]. 计算机系统应用, 2022, 31(2): 1-12.
[5]  戴德云, 陈宗海, 鲍鹏, 等. 电动汽车自动驾驶3D目标检测综述[J]. 世界电动汽车杂志, 2021, 12(3): 139.
[6]  Geiger, A., Lenz, P. and Urtasun, R. (2012) Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, 16-21 June 2012, 3354-3361.
https://doi.org/10.1109/cvpr.2012.6248074
[7]  Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., et al. (2020) NuScenes: A Multimodal Dataset for Autonomous Driving. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11618-11628.
https://doi.org/10.1109/cvpr42600.2020.01164
[8]  Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., et al. (2020) Scalability in Perception for Autonomous Driving: Waymo Open Dataset. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2443-2451.
https://doi.org/10.1109/cvpr42600.2020.00252
[9]  Huang, X., Wang, P., Cheng, X., Zhou, D., Geng, Q. and Yang, R. (2020) The Apolloscape Open Dataset for Autonomous Driving and Its Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2702-2719.
https://doi.org/10.1109/tpami.2019.2926463
[10]  He, T. and Soatto, S. (2019) Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 8409-8416.
https://doi.org/10.1609/aaai.v33i01.33018409
[11]  Flynn, J., Neulander, I., Philbin, J. and Snavely, N. (2016) Deep Stereo: Learning to Predict New Views from the World’s Imagery. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 5515-5524.
https://doi.org/10.1109/cvpr.2016.595
[12]  Xiang, Y., Choi, W., Lin, Y. and Savarese, S. (2015) Data-Driven 3D Voxel Patterns for Object Category Recognition. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 1903-1911.
https://doi.org/10.1109/cvpr.2015.7298800
[13]  Choi, W., Lin, Y., Xiang, Y., et al. (2018) Subcategory-Aware Convolutional Neural Networks for Object Detection. U.S. Patent 9,965,719.
[14]  Simonelli, A., Bulo, S.R., Porzi, L., Lopez-Antequera, M. and Kontschieder, P. (2019) Disentangling Monocular 3D Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1991-1999.
https://doi.org/10.1109/iccv.2019.00208
[15]  Minemura, K., Liau, H., Monrroy, A. and Kato, S. (2018) LMNet: Real-Time Multiclass Object Detection on CPU Using 3D LiDAR. 2018 3rd Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), Singapore, 21-23 July 2018, 28-34.
https://doi.org/10.1109/acirs.2018.8467245
[16]  Shi, S., Wang, X. and Li, H. (2019) PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 770-779.
https://doi.org/10.1109/cvpr.2019.00086
[17]  Zhou, Y. and Tuzel, O. (2018) VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4490-4499.
https://doi.org/10.1109/cvpr.2018.00472
[18]  Ku, J., Mozifian, M., Lee, J., Harakeh, A. and Waslander, S.L. (2018) Joint 3D Proposal Generation and Object Detection from View Aggregation. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 1-5 October 2018, 1-8.
https://doi.org/10.1109/iros.2018.8594049
[19]  Chen, X., Ma, H., Wan, J., Li, B. and Xia, T. (2017) Multi-View 3D Object Detection Network for Autonomous Driving. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6526-6534.
https://doi.org/10.1109/cvpr.2017.691
[20]  Qi, C.R., Liu, W., Wu, C., Su, H. and Guibas, L.J. (2018) Frustum PointNets for 3D Object Detection from RGB-D Data. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 918-927.
https://doi.org/10.1109/cvpr.2018.00102
[21]  Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.L., et al. (2023) BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, 29 May-2 June 2023, 2774-2781.
https://doi.org/10.1109/icra48891.2023.10160968
[22]  Yin, J., Shen, J., Chen, R., Li, W., Yang, R., Frossard, P., et al. (2024) IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 14905-14915.
https://doi.org/10.1109/cvpr52733.2024.01412
[23]  Shi, W. and Rajkumar, R. (2020) Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 1708-1716.
https://doi.org/10.1109/cvpr42600.2020.00178
[24]  Xiong, S., Li, B. and Zhu, S. (2022) DCGNN: A Single-Stage 3D Object Detection Network Based on Density Clustering and Graph Neural Network. Complex & Intelligent Systems, 9, 3399-3408.
https://doi.org/10.1007/s40747-022-00926-z
[25]  汪明明, 陈庆奎, 付直兵. HetGNN-3D: 基于异构图神经网络的3D目标检测优化模型[J]. 小型微型计算机系统, 2024, 45(2): 438-445.
[26]  Qi, C.R., Yi, L., Su, H. and Guibas, L.J. (2017) PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5105-5114.
[27]  Yang, Z., Sun, Y., Liu, S., Shen, X. and Jia, J. (2019) STD: Sparse-to-Dense 3D Object Detector for Point Cloud. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1951-1960.
https://doi.org/10.1109/iccv.2019.00204

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133