|
解耦动态区域的自监督单目深度估计模型
|
Abstract:
近年来,自监督单目深度估计因其无需深度标签的优势,在计算机视觉领域获得了广泛关注。然而,传统自监督单目深度预测方法通常基于静态场景假设,这导致在相邻帧中出现动态对象时,深度预测的精度会显著下降。为了解决这一问题,本文提出了一种多帧自监督单目深度估计模型。该模型通过分割网络预先识别图像中的运动物体,并利用多帧图像之间的光流信息来重构图像。通过将静态场景与动态物体分开处理,该方法有效提高了动态物体深度估计的准确性。此外,本文设计了动态物体重构损失(Dynamic Object Reconstruction Loss, DRL)和深度一致损失(Depth Consistency Loss, DCL),以监督动态重构图和重构深度图的生成。实验结果表明,在三个公共数据集上,该方法优于现有的主流方法,能够在动态场景中准确预测深度图。
Recent years, self-supervised monocular depth estimation has garnered extensive attention in the field of computer vision due to its advantage of not requiring depth labels. However, traditional self-supervised monocular depth prediction methods are typically based on the assumption of static scenes, which leads to a significant decrease in depth prediction accuracy when dynamic objects appear in consecutive frames. To address this issue, this paper proposes a multi-frame self-supervised monocular depth estimation model. The model identifies moving objects in images through a segmentation network and reconstructs images using optical flow information between multiple frames. By separating static scenes from dynamic objects, this approach effectively improves the accuracy of depth estimation for dynamic objects. Additionally, this paper have designed the Dynamic Object Reconstruction Loss (DRL) and Depth Consistency Loss (DCL) to supervise the generation of dynamic reconstruction images and reconstructed depth maps. Experimental results demonstrate that this method outperforms existing mainstream approaches on three public datasets, enabling accurate depth prediction in dynamic scenes.
[1] | Borghi, G., Venturelli, M., Vezzani, R. and Cucchiara, R. (2017) POSEidon: Face-From-Depth for Driver Pose Estimation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 5494-5503. https://doi.org/10.1109/cvpr.2017.583 |
[2] | Biswas, J. and Veloso, M. (2012) Depth Camera Based Indoor Mobile Robot Localization and Navigation. 2012 IEEE International Conference on Robotics and Automation, Saint Paul, 14-18 May 2012, 1697-1702. https://doi.org/10.1109/icra.2012.6224766 |
[3] | Swan, J.E., Jones, A., Kolstad, E., Livingston, M.A. and Smallman, H.S. (2007) Egocentric Depth Judgments in Optical, See-Through Augmented Reality. IEEE Transactions on Visualization and Computer Graphics, 13, 429-442. https://doi.org/10.1109/tvcg.2007.1035 |
[4] | Godard, C., Aodha, O.M. and Brostow, G.J. (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6602-6611. https://doi.org/10.1109/cvpr.2017.699 |
[5] | Song, S. and Chandraker, M. (2014) Robust Scale Estimation in Real-Time Monocular SFM for Autonomous Driving. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 1566-1573. https://doi.org/10.1109/cvpr.2014.203 |
[6] | Zhang, N., Nex, F., Vosselman, G. and Kerle, N. (2023) Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 18537-18546. https://doi.org/10.1109/cvpr52729.2023.01778 |
[7] | Wei, Y., Zhao, L., Zheng, W., Zhu, Z., Rao, Y., Huang, G., Lu, J. and Zhou, J. (2023) SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation. Proceedings of the PMLR Conference on Robot Learning, Atlanta, 6-9 November 2023, 539-549. |
[8] | Sun, Y. and Hariharan, B. (2024) Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes. Advances in Neural Information Processing Systems, 36, 54987-55005. |
[9] | Geiger, A., Lenz, P., Stiller, C. and Urtasun, R. (2013) Vision Meets Robotics: The KITTI Dataset. The International Journal of Robotics Research, 32, 1231-1237. https://doi.org/10.1177/0278364913491297 |
[10] | Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., et al. (2020) NuScenes: A Multimodal Dataset for Autonomous Driving. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11618-11628. https://doi.org/10.1109/cvpr42600.2020.01164 |
[11] | Guizilini, V., Vasiljevic, I., Chen, D., Ambruș, R. and Gaidon, A. (2023) Towards Zero-Shot Scale-Aware Monocular Depth Estimation. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 9199-9209. https://doi.org/10.1109/iccv51070.2023.00847 |
[12] | Kumar, A.C.S., Bhandarkar, S.M. and Prasad, M. (2018) DepthNet: A Recurrent Neural Network Architecture for Monocular Depth Prediction. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 18-22 June 2018, 3960-3968. https://doi.org/10.1109/cvprw.2018.00066 |
[13] | Watson, J., Mac Aodha, O., Prisacariu, V., Brostow, G. and Firman, M. (2021) The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 1164-1174. https://doi.org/10.1109/cvpr46437.2021.00122 |
[14] | Zhou, T., Brown, M., Snavely, N. and Lowe, D.G. (2017) Unsupervised Learning of Depth and Ego-Motion from Video. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6612-6619. https://doi.org/10.1109/cvpr.2017.700 |
[15] | Godard, C., Aodha, O.M. and Brostow, G.J. (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6602-6611. https://doi.org/10.1109/cvpr.2017.699 |
[16] | Loshchilov, I. and Hutter, F. (2017) Decoupled Weight Decay Regularization. arXiv: 1711.05101. |
[17] | Teed, Z. and Deng, J. (2020) RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Springer, 402-419. https://doi.org/10.1007/978-3-030-58536-5_24 |
[18] | Godard, C., Aodha, O.M., Firman, M. and Brostow, G. (2019) Digging into Self-Supervised Monocular Depth Estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 3827-3837. https://doi.org/10.1109/iccv.2019.00393 |
[19] | Guizilini, V., Ambrus, R., Pillai, S., Raventos, A. and Gaidon, A. (2020) 3D Packing for Self-Supervised Monocular Depth Estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2482-2491. https://doi.org/10.1109/cvpr42600.2020.00256 |
[20] | Guizilini, V., Vasiljevic, I., Ambrus, R., Shakhnarovich, G. and Gaidon, A. (2022) Full Surround Monodepth from Multiple Cameras. IEEE Robotics and Automation Letters, 7, 5397-5404. https://doi.org/10.1109/lra.2022.3150884 |
[21] | Kim, J.H., Hur, J., Nguyen, T.P. and Jeong, S.G. (2022) Self-Supervised Surround-View Depth Estimation with Volumetric Feature Fusion. Advances in Neural Information Processing Systems, 35, 4032-4045. |
[22] | Bhat, S.F., Birkl, R., Wofk, D., Wonka, P. and Müller, M. (2023) ZoeDepth: Zero-Shot Transfer by Combining Relative and Metric Depth. arXiv: 2302.12288. |
[23] | Vankadari, M., Golodetz, S., Garg, S., Shin, S., Markham, A. and Trigoni, N. (2023) When the Sun Goes Down: Repairing Photometric Losses for All-Day Depth Estimation. Conference on Robot Learning, Atlanta, 6 November 2023, 1992-2003. |
[24] | Marsal, R., Chabot, F., Loesch, A., Grolleau, W. and Sahbi, H. (2024) MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable Uncertainty. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-9 January 2024, 3625-3634. https://doi.org/10.1109/wacv57701.2024.00360 |
[25] | Yin, Z. and Shi, J. (2018) GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1983-1992. https://doi.org/10.1109/cvpr.2018.00212 |
[26] | Zhao, W., Liu, S., Shu, Y. and Liu, Y. (2020) Towards Better Generalization: Joint Depth-Pose Learning without PoseNet. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9148-9158. https://doi.org/10.1109/cvpr42600.2020.00917 |
[27] | Li, H., Gordon, A., Zhao, H., Casser, V. and Angelova, A. (2021) Unsupervised Monocular Depth Learning in Dynamic Scenes. Proceedings of the PMLR Conference on Robot Learning, London, 8-11 November 2021, 1908-1917. |
[28] | Saunders, K., Vogiatzis, G. and Manso, L.J. (2023) Dyna-DM: Dynamic Object-Aware Self-Supervised Monocular Depth Maps. 2023 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Tomar, 26-27 April 2023, 10-16. https://doi.org/10.1109/icarsc58346.2023.10129564 |