%0 Journal Article %T 基于深度学习的3D目标检测技术综述
Survey on 3D Object Detection Based on Deep Learning %A 邵博 %A 徐仕琦 %A 张子琦 %A 杨琳倩 %A 高炜晴 %A 何嘉懿 %A 秦傲雪 %A 吴茜茵 %J Journal of Image and Signal Processing %P 173-184 %@ 2325-6745 %D 2025 %I Hans Publishing %R 10.12677/jisp.2025.142017 %X 基于深度学习的3D目标检测技术在自动驾驶、机器人导航等众多前沿领域意义重大。然而,当前该技术仍面临诸多挑战,如处理大规模3D数据时计算复杂度高、小目标检测困难等,这些问题严重制约了其进一步发展与广泛应用。为突破困境,文章详细介绍了KITTI、NuScenes等常用数据集,并对基于图像、点云及多传感器融合的3D目标检测方法进行了分类分析。基于图像的方法受限于深度信息不足,检测精度较低;基于点云的方法借助深度信息,精度优势明显;多传感器融合方法则展现出更强的检测性能;而基于Transformer和图神经网络(GNN)的方法通过全局上下文建模与空间关系推理推动技术突破。通过对主流模型在KITTI和NuScenes数据集上的性能评估与对比,分析了不同模型在检测精度及复杂场景适应性上的表现差异。结论表明,未来可进一步探索轻量级网络架构、多模态动态融合策略及基于物理感知的小目标增强技术,结合Transformer全局建模与GNN关系推理,推动3D目标检测在实时性、复杂场景适应性及小目标检测精度上的突破。
3D object detection based on deep learning holds significant importance in cutting-edge fields such as autonomous driving and robotic navigation. However, the technology still faces multiple challenges, including high computational complexity when processing large-scale 3D data and difficulties in small object detection, which severely restrict its further development and widespread application. To address these limitations, this article provides a detailed introduction to commonly used datasets like KITTI and NuScenes, along with a categorized analysis of 3D object detection methods based on images, point clouds, and multi-sensor fusion. Image-based methods suffer from limited depth information and lower detection accuracy, while point cloud-based approaches demonstrate clear precision advantages by leveraging depth data. Multi-sensor fusion methods exhibit superior detection performance, whereas Transformer-based and Graph Neural Network (GNN) approaches drive technological breakthroughs through global context modeling and spatial relationship reasoning. Through performance evaluation and comparison of mainstream models on KITTI and NuScenes datasets, the study analyzes differences in detection accuracy and adaptability to complex scenarios. The conclusion suggests that future research could focus on exploring lightweight network architectures, dynamic multi-modal fusion strategies, and physics-aware enhancement techniques for small objects. By combining Transformer’s global modeling with GNN’s relational reasoning, breakthroughs may be achieved in real-time performance, complex scenario adaptability, and small object detection accuracy for 3D object detection. %K 目标检测, %K 3D车辆检测, %K 深度学习, %K 计算机视觉
Object Detections %K 3D Vehicle Detection %K Deep Learning %K Computer Vision %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=110634