全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于图注意力机制的自监督单目深度估计
Self-Supervised Monocular Depth Estimation Based on Multi-Scale Graph Attention Mechanism

DOI: 10.12677/airr.2025.142031, PP. 313-319

Keywords: 图卷积网络,图注意力网络,GAT模块,自监督训练
Graph Convolutional Networks
, Graph Attention Networks, GAT Modules, Self-Supervised Training

Full-Text   Cite this paper   Add to My Lib

Abstract:

为解决单目自监督深度估计在缺乏准确真实深度时对复杂场景几何结构的刻画不足问题,本文在原有的基于图卷积网络(Graph Convolutional Network, GCN)的单目深度估计框架上,引入了图注意力网络(Graph Attention Network, GAT)机制,提出了一种GATDepth模型。该模型通过在解码器阶段采用图注意力模块,能够自适应地为相邻节点分配不同权重,从而更精细地保留场景中的几何拓扑关系与不连续性。DepthNet编码器利用CNN提取多层次视觉特征,而解码器则结合转置卷积上采样和GAT模块融合节点特征。通过目标图像与重构图像之间的光度、重投影及平滑性等多重损失进行自监督训练,模型在KITTI数据集上取得了优异的深度估计性能,尤其在远距物体和物体边缘等关键区域表现突出。实验结果表明,所提方法不仅在保证网络效率的同时更好地捕捉了场景关键几何信息,而且在缺乏高质量真实深度的条件下仍能获得可靠且精细的深度预测。
To address the issue of insufficient depiction of complex scene geometry in monocular self-supervised depth estimation due to the lack of accurate ground truth depth, this paper proposes a GATDepth model based on the existing monocular depth estimation framework using Graph Convolutional Networks (GCN). The Graph Attention Network (GAT) mechanism is introduced into the model. By adopting graph attention modules in the decoder stage, the model can adaptively assign different weights to adjacent nodes, thereby more finely preserving the geometric topology and discontinuities in the scene. The DepthNet encoder extracts multi-level visual features using CNNs, while the decoder combines transposed convolutional upsampling and GAT modules to fuse node features. The model is trained in a self-supervised manner through multiple losses such as photometric, reprojection, and smoothness losses between the target image and the reconstructed image. The model achieves excellent depth estimation performance on datasets such as KITTI, especially in key areas such as distant objects and object edges. Experimental results show that the proposed method not only better captures key geometric information of the scene while ensuring network efficiency, but also obtains reliable and fine depth predictions even in the absence of high-quality ground truth depth.

References

[1]  樊振宇. 基于深度学习的自动驾驶汽车周边车辆轨迹预测方法研究[D]: [硕士学位论文]. 镇江: 江苏大学, 2022.
[2]  黄峻, 田永林, 戴星原, 等. 基于深度学习的自动驾驶多模态轨迹预测方法: 现状及展望[J]. 智能科学与技术学报, 2023, 5(2): 180-199.
[3]  谭紫阳, 高忠文, 邓宇. 基于改进极限学习机和深度神经网络融合的车辆轨迹长期预测[J]. 汽车技术, 2020(11): 16-20.
[4]  江铃燚, 郑艺峰, 陈澈, 等. 有监督深度学习的优化方法研究综述[J]. 中国图象图形学报, 2023, 28(4): 963-983.
[5]  Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C. and Yu, P.S. (2021) A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 32, 4-24.
https://doi.org/10.1109/tnnls.2020.2978386
[6]  Rao, V.D., Zhang, Z.W. and Leskovec, J. (2021) Graph Attention on Point Clouds: A Survey. Proceedings of the IEEE/CVF International Conference on Computer Vision, 11-17 October 2021, 3482-3491.
[7]  Leskovec, J., Sen, R., et al. (2020) End-to-End Graph Learning for Large-Scale Drug Discovery. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 6-12 December 2020, 1-12.
[8]  Shao, W.Q., Bai, K., Zhang, S.Q., et al. (2021) Local Feature Fusion with Predictive Learning for 3D Object Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19-25 June 2021, 14122-14132.
[9]  Zhang, Y., Sun, Z., et al. (2019) Visual Inertial Camera Pose Estimation with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, 20-24 May 2019, 2155-2161.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133