OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2023

单帧非自然图像深度估计与动态合成
Depth Estimation and Dynamic Synthesis of Single Frame Unnatural Images

DOI: 10.12677/CSA.2023.134070, PP. 708-719

余林江, 杨伊欣

Keywords: 单目深度估计，非自然图像，精细化，绘画图像数据集，基于深度图像渲染
Monocular Depth Estimation, Unnatural Images, Refinement, Painting Image Datasets, Depth Image Based Rendering

Full-Text Cite this paper Add to My Lib

Abstract:

深度学习在单目深度估计任务上表现优异，通过学习单帧图像与深度图像之间存在的映射关系来估计图像的深度。但是，目前单目深度估计的研究仅关注于自然图像，当把它应用于非自然图像，如绘画图像时，相对于自然图像，它们有着低纹理、切边锐利、平滑过渡相对少的特点，会出现深度估计中前后不同物体的层次感不明显，以及同一物体上出现深度不一致的问题。本文根据这类图像设计了一个由单目深度估计模块和RGB图像指导的精细化模块构成的精细单目深度估计网络RefineDepth来改善以上问题。同时，由于绘画图像缺乏对应深度信息，本文通过三维场景卡通风格渲染图像来模拟绘画类非自然图像的方式，制作了两个绘画图像数据集SSMO和SU3D，并建立了一个真实的山水画测试集。实验结果表明，模型在测试的数据集上都取得了出色的结果。最后，将绘画图像进行基于深度图像渲染，动态合成立体效果。
Deep learning performs well in monocular depth estimation tasks, estimating the depth of an image by learning the mapping relationship between a single image and a depth image. However, the current research on monocular depth estimation only focuses on natural images. When it is applied to unnatural images, such as painting images, they have low texture, sharp cutting edges, and relatively few smooth transitions. In depth estimation, the layering of different objects before and after is not obvious, and the depth of the same object is inconsistent. Based on such images, this paper designs a refined monocular depth estimation network RefineDepth, consisting of a monocular depth estimation module and a RGB image-guided refinement module to improve the above problems. Meanwhile, due to the lack of corresponding depth information in the painting image, we render images in a cartoon style of 3D scenes to simulate the way of painting unnatural images to make two painting image datasets SSMO and SU3D, and build a real landscape painting test set. The experimental results show that the model has achieved excellent results on the tested datasets. Fi-nally, the painting image is rendered based on the depth image, and the three-dimensional effect is dynamically synthesized.

References

[1]	Xie, J., Girshick, R. and Farhadi, A. (2016) Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolu-tional Neural Networks. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Lec-ture Notes in Computer Science, Springer, New York, 842-857. https://doi.org/10.1007/978-3-319-46493-0_51
[2]	Wang, Z., Wu, S., Xie, W., et al. (2021) NeRF？？: Neural Ra-diance Fields without Known Camera Parameters. Preprint. ArXiv: 2102.07064.
[3]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You need. Advances in Neural Information Processing Systems, 5998-6008.
[4]	王震. 自由视点立体电视系统的虚拟视点合成技术研究[D]: [硕士学位论文]. 上海: 上海交通大学, 2012.
[5]	Prados, E. and Faugeras, O. (2006) Shape from Shading. In: Paragios, N., Chen, Y. and Faugeras, O., Eds., Handbook of Mathematical Models in Computer Vision, Springer, Boston, MA, 375-388. https://doi.org/10.1007/0-387-28831-7_23
[6]	Tsai, Y.M., Chang, Y.L. and Chen, L.G. (2006) Block-Based Vanishing Line and Vanishing Point Detection for 3D Scene Reconstruction. 2006 International Symposium on Intelli-gent Signal Processing and Communications, Yonago, 12-15 December 2006, 586-589. https://doi.org/10.1109/ISPACS.2006.364726
[7]	Lowe, D.G. (1999) Object Recognition from Local Scale-Invariant Features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, 20-27 September 1999, 1150-1157. https://doi.org/10.1109/ICCV.1999.790410
[8]	Lafferty, J., McCallum, A. and Pereira, F.C. (2001) Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth Inter-national Conference on Machine Learning, San Francisco, CA, 28 June 2001-1 July 2001, 282-289.
[9]	Cross, G.R. and Jain, A.K. (1983) Markov Random Field Texture Models. IEEE Transactions on Pattern Analysis and Machine In-telligence, PAMI-5, 25-39. https://doi.org/10.1109/TPAMI.1983.4767341
[10]	Eigen, D., Puhrsch, C. and Fergus, R. (2014) Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network. Proceedings of the 28th An-nual Conference on Neural Information Processing Systems, Montreal, 8-13 December 2014, 2366-2374.
[11]	Wang, L., Zhang, J., Wang, Y., Lu, H. and Ruan, X. (2020) Cliffnet for Monocular Depth Estimation with Hierarchical Embed-ding Loss. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.M., Eds., Computer Vision—ECCV 2020, Lecture Notes in Computer Science, Springer, Cham, 316-331. https://doi.org/10.1007/978-3-030-58558-7_19
[12]	Godard, C., Aodha, O.M. and Brostow, G.J. (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 6602-6611. https://doi.org/10.1109/CVPR.2017.699
[13]	Yang, G., Tang, H., Ding, M., Sebe, N. and Ricci, E. (2021) Trans-formers Solve the Limited Receptive Field for Monocular Depth Prediction. ArXiv: 2103.12091. https://arxiv.org/abs/2103.12091
[14]	Bhat, S.F., Alhashim, I. and Wonka, P. (2021) Adabins: Depth Estimation Using Adaptive Bins. ArXiv: 2011.14141. https://arxiv.org/abs/2011.14141
[15]	Ranftl, R., Bochkovskiy, A. and Koltun, V. (2021) Vision Transformers for Dense Prediction. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, QC, 10-17 Octomber 2021, 12179-12188. https://doi.org/10.1109/ICCV48922.2021.01196
[16]	Lin, G., Milan, A., Shen, C., et al. (2017) Refinenet: Mul-ti-Path Refinement Networks for High-Resolution Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 5168-5177. https://doi.org/10.1109/CVPR.2017.549
[17]	Zhang, Z., Cui, Z., Xu, C., Jie, Z., Li, X. and Yang, J. (2018) Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Springer, Cham, 238-255. https://doi.org/10.1007/978-3-030-01249-6_15
[18]	Zhou, L., et al. (2020) Pattern-Structure Diffusion for Mul-ti-Task Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 4513-4522. https://doi.org/10.1109/CVPR42600.2020.00457
[19]	Hu, X.T., et al. (2022) Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Lecture Notes in Computer Science, Springer, Cham, 74-91. https://doi.org/10.1007/978-3-031-19800-7_5
[20]	Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Li, F.F. (2009) Imagenet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 20-25 June 2009, 248-255.
[21]	Wang, X., Yu, K., Wu, S., et al. (2018) Esrgan: Enhanced Super-Resolution Generative Adversarial Networks. In: Leal-Taixé, L. and Roth, S., Eds., Computer Vision—ECCV 2018, Lecture Notes in Computer Science, Springer, Cham, 63-79. https://doi.org/10.1007/978-3-030-11021-5_5
[22]	Silberman, N., Hoiem, D., Kohli, P., et al. (2012) Indoor Seg-mentation and Support Inference from RGBD Images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. and Schmid, C., Eds., Computer Vision—ECCV 2012, Lecture Notes in Computer Science, Springer, Berlin, 746-760. https://doi.org/10.1007/978-3-642-33715-4_54
[23]	Kingma, D.P. and Jimmy, B. (2017) Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

单帧非自然图像深度估计与动态合成Depth Estimation and Dynamic Synthesis of Single Frame Unnatural Images

单帧非自然图像深度估计与动态合成
Depth Estimation and Dynamic Synthesis of Single Frame Unnatural Images