全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于三维场景的语义感知风格迁移
Semantic-Aware Style Transfer Based on 3D Scenes

DOI: 10.12677/pm.2025.154106, PP. 31-45

Keywords: 风格迁移,三维高斯,语义感知
Style Transfer
, 3D Gaussian Splatting, Semantic Perception

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着电影和游戏行业的快速发展,三维场景的创建与编辑方法不断优化,逐渐向更高效、更便捷的方向发展。相比传统的网格和点云表示方法,三维高斯提供了一种更灵活且高效的三维场景表示方式,能够在保证高质量渲染效果的同时,生成逼真的新视角图像。然而,现有的三维高斯模型在风格化方面仍存在局限性,难以满足创意设计和艺术表达的需求。因此,如何在保持三维结构信息的同时,实现高质量的风格迁移,成为一个值得深入研究的问题。针对这一问题,本文提出了一种基于三维高斯的语义风格迁移方法。首先,通过多视角图像训练三维高斯模型,并在这些图像上进行风格迁移,以确保三维模型的风格一致性和结构完整性。具体而言,我们利用LSeg模型对内容图像和风格图像进行语义分割,提取对应区域后,基于图像复杂度自适应确定聚类类别数量,在颜色空间采用K均值聚类进行分割,并以聚类区域面积筛选有效的结构信息。随后,通过语义匹配进行风格迁移,并结合WCT进行风格融合,最终使用VGG解码器生成风格化图像。实验结果表明,本文方法在风格质量、结构保持性和多视角一致性方面均优于现有方法,为三维艺术创作提供了更高质量的风格迁移效果和更强的可控性。
With the rapid development of the film and gaming industries, methods for creating and editing 3D scenes have been continuously optimized, evolving toward greater efficiency and convenience. Compared to traditional representations based on meshes and point clouds, 3D Gaussian Splatting provides a more flexible and efficient way to represent 3D scenes, enabling high-quality novel view synthesis while maintaining superior rendering performance. However, existing 3D Gaussian models still have limitations in stylization, making it difficult to meet the demands of creative design and artistic expression. Therefore, achieving high-quality style transfer while preserving 3D structural information remains a challenging research problem. To address this issue, we propose a semantic style transfer method based on 3D Gaussians. First, a 3D Gaussian model is trained using multi-view images, and style transfer is performed on these images to ensure consistency and structural integrity in the final 3D model. Specifically, we utilize the LSeg model for semantic segmentation of content and style images. After extracting corresponding regions, we adaptively determine the number of clusters based on image complexity and apply K-means clustering in the color space to segment the images. The clustered regions are then filtered based on their area to retain essential structural information. Subsequently, style transfer is performed using semantic matching, and style fusion is achieved with the Whitening and Coloring Transform (WCT). Finally, a VGG-based decoder generates the stylized images. Experimental results demonstrate that our method outperforms existing approaches in terms of style quality, structural preservation, and multi-view consistency, providing better controllability and higher-quality style transfer for 3D artistic content creation.

References

[1]  Kerbl, B., Kopanas, G., Leimkuehler, T. and Drettakis, G. (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42, 1-14.
https://doi.org/10.1145/3592433

[2]  Huang, X. and Belongie, S. (2017) Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1501-1510.
https://doi.org/10.1109/iccv.2017.167

[3]  Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X. and Yang, M.H. (2017) Universal Style Transfer via Feature Transforms. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 386-396.
[4]  Zhang, Y., Fang, C., Wang, Y., Wang, Z., Lin, Z., Fu, Y., et al. (2019) Multimodal Style Transfer via Graph Cuts. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 5942-5950.
https://doi.org/10.1109/iccv.2019.00604

[5]  Li, B., Weinberger, K.Q., Belongie, S., Koltun, V. and Ranftl, R. (2022) Language-Driven Semantic Seg Mentation.
[6]  Kyprianidis, J.E., Collomosse, J., Wang, T. and Isenberg, T. (2013) State of the “Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video. IEEE Transactions on Visualization and Computer Graphics, 19, 866-885.
https://doi.org/10.1109/tvcg.2012.160

[7]  Portilla, J. and Simoncelli, E.P. (2000) A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision, 40, 49-70.
[8]  Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition.
[9]  Risser, E., Wilmot, P. and Barnes, C. (2017) Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses.
[10]  Gu, S., Chen, C., Liao, J. and Yuan, L. (2018). Arbitrary Style Transfer with Deep Feature Reshuffle. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 8222-8231.
https://doi.org/10.1109/cvpr.2018.00858

[11]  Kolkin, N., Salavon, J. and Shakhnarovich, G. (2019) Style Transfer by Relaxed Optimal Transport and Self-Similarity. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 10051-10060.
https://doi.org/10.1109/cvpr.2019.01029

[12]  Liao, J., Yao, Y., Yuan, L., Hua, G. and Kang, S.B. (2017) Visual Attribute Transfer through Deep Image Analogy. ACM Transactions on Graphics, 36, 1-15.
https://doi.org/10.1145/3072959.3073683

[13]  An, J., Huang, S., Song, Y., Dou, D., Liu, W. and Luo, J. (2021) Artflow: Unbiased Image Style Transfer via Reversible Neural Flows. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 862-871.
https://doi.org/10.1109/cvpr46437.2021.00092

[14]  Chen, C. (2020) Structure-Emphasized Multimodal Style Transfer. Zenodo.
[15]  Park, D.Y. and Lee, K.H. (2019) Arbitrary Style Transfer with Style-Attentional Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 5880-5888.
https://doi.org/10.1109/cvpr.2019.00603

[16]  Liu, K., Zhan, F., Chen, Y., Zhang, J., Yu, Y., Saddik, A.E., et al. (2023) StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 8338-8348.
https://doi.org/10.1109/cvpr52729.2023.00806

[17]  Martin-Brualla, R., Radwan, N., Sajjadi, M.S.M., Barron, J.T., Dosovitskiy, A. and Duckworth, D. (2021) Nerf in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 7210-7219.
https://doi.org/10.1109/cvpr46437.2021.00713

[18]  Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P.P. and Barron, J.T. (2022) NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16190-16199.
https://doi.org/10.1109/cvpr52688.2022.01571

[19]  Mishra, S. and Granskog, J. (2022) Clip-Based Neural Neighbor Style Transfer for 3D Assets.
[20]  Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R. and Ng, R. (2021) NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Communications of the ACM, 65, 99-106.
https://doi.org/10.1145/3503250

[21]  Chan, E.R., Lin, C.Z., Chan, M.A., Nagano, K., Pan, B., de Mello, S., et al. (2022) Efficient Geometry-Aware 3D Generative Adversarial Networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 16123-16133.
https://doi.org/10.1109/cvpr52688.2022.01565

[22]  Chen, A., Xu, Z., Geiger, A., Yu, J. and Su, H. (2022) TensoRF: Tensorial Radiance Fields. 17th European Conference on Computer Vision, Tel Aviv, 23-27 October 2022, 333-350.
https://doi.org/10.1007/978-3-031-19824-3_20

[23]  Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B. and Kanazawa, A. (2023) K-Planes: Explicit Radiance Fields in Space, Time, and Appearance. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 12479-12488.
https://doi.org/10.1109/cvpr52729.2023.01201

[24]  Kato, H., Ushiku, Y. and Harada, T. (2018) Neural 3D Mesh Renderer. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3907-3916.
https://doi.org/10.1109/cvpr.2018.00411

[25]  Michel, O., Bar-On, R., Liu, R., Benaim, S. and Hanocka, R. (2022) Text2mesh: Text-Driven Neural Stylization for Meshes. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 13492-13502.
https://doi.org/10.1109/cvpr52688.2022.01313

[26]  Yin, K., Gao, J., Shugrina, M., Khamis, S. and Fidler, S. (2021) 3DStyleNet: Creating 3D Shapes with Geometric and Texture Style Variations. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 12456-12465.
https://doi.org/10.1109/iccv48922.2021.01223

[27]  Huang, H., Tseng, H., Saini, S., Singh, M. and Yang, M. (2021) Learning to Stylize Novel Views. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13869-13878.
https://doi.org/10.1109/iccv48922.2021.01361

[28]  Zhang, K., Kolkin, N., Bi, S., Luan, F., Xu, Z., Shechtman, E., et al. (2022) ARF: Artistic Radiance Fields. 17th European Conference on Computer Vision, Tel Aviv, 23-27 October 2022, 717-733.
https://doi.org/10.1007/978-3-031-19821-2_41

[29]  Gatys, L., Ecker, A. and Bethge, M. (2016) A Neural Algorithm of Artistic Style. Journal of Vision, 16, Article No. 326.
https://doi.org/10.1167/16.12.326

[30]  Huang, Y., He, Y., Yuan, Y., Lai, Y. and Gao, L. (2022) StylizedNeRF: Consistent 3D Scene Stylization as Stylized Nerf via 2D-3D Mutual Learning. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 18342-18352.
https://doi.org/10.1109/cvpr52688.2022.01780

[31]  Wang, C., Jiang, R., Chai, M., He, M., Chen, D. and Liao, J. (2024) NeRF-Art: Text-Driven Neural Radiance Fields Stylization. IEEE Transactions on Visualization and Computer Graphics, 30, 4983-4996.
https://doi.org/10.1109/tvcg.2023.3283400

[32]  Xu, S., Li, L., Shen, L. and Lian, Z. (2023) DeSRF: Deformable Stylized Radiance Field. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, 17-24 June 2023, 709-718.
https://doi.org/10.1109/cvprw59228.2023.00078

[33]  Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning, 18-24 July 2021, 8748-8763.
[34]  Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale.
[35]  Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G. and Cohen-Or, D. (2022) StyleGAN-NADA: Clip Guided Domain Adaptation of Image Generators. ACM Transactions on Graphics, 41, 1-13.
https://doi.org/10.1145/3528223.3530164

[36]  Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. 13th European Conference on Computer Vision, Zurich, 6-12 September 2014, 740-755.
https://doi.org/10.1007/978-3-319-10602-1_48

[37]  Nichol, K. (2016) Painter by Numbers. Wikiart.
https://github.com/inejc/painters
[38]  Kingma, D.P. and Ba, J.L. (2014) Adam: A Method for Stochastic Optimization.
[39]  Knapitsch, A., Park, J., Zhou, Q. and Koltun, V. (2017) Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. ACM Transactions on Graphics, 36, 1-13.
https://doi.org/10.1145/3072959.3073599

[40]  Wang, C., Chai, M., He, M., Chen, D. and Liao, J. (2022) CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 3835-3844.
https://doi.org/10.1109/cvpr52688.2022.00381

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133