%0 Journal Article
%T 混合CNN和ViT的自监督知识蒸馏单目深度估计方法
Hybrid CNN and ViT for Self-Supervised Knowledge Distillation Monocular Depth Estimation Method
%A 郑千惠
%A 孔玲君
%J Modeling and Simulation
%P 2868-2880
%@ 2324-870X
%D 2024
%I Hans Publishing
%R 10.12677/mos.2024.133260
%X 单目深度估计是一项具有挑战性的任务,现有的方法无法高效利用特征的长程相关性和局部信息。针对该问题,本文提出一种混合CNN和ViT (Vision Transformer)的自监督知识蒸馏单目深度估计方法HCVNet。HCVNet对CNN和Vision Transformer的有效组合进行研究,设计了CNN-ViT混合特征编码器,来建模局部和全局上下文信息,提取更具场景表达性的细节特征。采用通道特征聚合模块来捕获长距离依赖,通过在通道维度上聚合区分度高的特征,来增强场景结构的感知能力。引入自监督知识蒸馏,利用结构相同的教师模型为学生模型的训练提供更多监督信号,进一步提高网络性能。在KITTI和Make3D数据集上的实验结果表明,本方法的深度估计性能优于目前的主流方法,且具有较强的泛化能力,能够更好地估计出结构完整细节清晰的深度图。
Monocular depth estimation is a challenging task, and existing methods cannot efficiently utilize feature long-range correlation and local information. To address this problem, this paper proposes HCVNet, a hybrid CNN and ViT (Vision Transformer) method for self-supervised knowledge distillation monocular depth estimation. HCVNet investigates the effective combination of CNN and Vision Transformer, and designs a hybrid CNN-ViT feature encoder to model local and global contextual information and extract more scene-expressive detailed features. Channel feature aggregation module is employed to capture long-range dependencies and enhance the perception of scene structure by aggregating discriminative features in the channel dimension. Self-supervised knowledge distillation is introduced to provide more supervised signals for the training of student models using structurally identical teacher models to further improve network performance. Experimental results on KITTI and Make3D datasets confirm that the depth estimation performance of this method is better than the current mainstream methods and has strong generalization ability, which can better estimate the depth map with complete structure and clear details.
%K 单目深度估计,自监督学习,知识蒸馏,Vision Transformer
Monocular Depth Estimation
%K Self-Supervised Learning
%K Knowledge Distillation
%K Vision Transformer
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=87686