|
基于条件GANs的高分辨率图像合成模型
|
Abstract:
本研究提出了一种创新的方法,旨在通过条件生成对抗网络(Conditional Generative Adversarial Networks, cGANs)从语义标签图合成高分辨率、照片级逼真的图像。尽管条件性GANs在多个领域展现出广泛的应用潜力,但其生成的图像通常分辨率较低,且与真实图像的相似度存在显著差距。针对这一挑战,本研究引入了一种新颖的对抗性损失函数,并设计了一种多尺度生成器和判别器架构,以提升图像合成的质量和分辨率。具体而言,我们的方法能够产生2048 × 1024像素的高分辨率图像,这些图像在视觉吸引力上取得了显著的提升。通过与现有技术的比较,我们的方法在深度图像合成和编辑的质量及分辨率方面均展现出明显的优越性。本研究的创新之处在于提出了一种新的对抗性学习目标和多尺度架构,有效解决了cGANs在生成高分辨率图像时的不稳定性问题,并显著提高了图像细节和纹理的真实性,为高分辨率图像合成领域提供了新的技术路径。
This study presents an innovative approach aimed at synthesizing high-resolution, photo-realistic images from semantically labeled graphs via Conditional Generative Adversarial Networks (cGANs). Although conditional GANs show a wide range of potential applications in several domains, the images they generate are usually of low resolution and have significant gaps in similarity to real images. To address this challenge, this study introduces a novel adversarial loss function and designs a multi-scale generator and discriminator architecture to enhance the quality and resolution of image synthesis. Specifically, our method is able to produce high-resolution images of 2048 × 1024 pixels, which achieve remarkable results in terms of visual appeal. By comparing with existing techniques, our method demonstrates significant superiority in both quality and resolution of deep image synthesis and editing. The innovation of this study lies in the proposal of a new adversarial learning target and a multi-scale architecture, which effectively solves the instability problem of cGANs in generating high-resolution images and significantly improves the authenticity of image details and textures, providing a new technical path for the field of high-resolution image synthesis.
[1] | Jia, Y., Yu, W. and Zhao, L. (2024) Generative Adversarial Networks with Texture Recovery and Physical Constraints for Remote Sensing Image Dehazing. Scientific Reports, 14, Article No. 31426. https://doi.org/10.1038/s41598-024-83088-x |
[2] | Isola, P., Zhu, J., Zhou, T. and Efros, A.A. (2017) Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 5967-5976. https://doi.org/10.1109/cvpr.2017.632 |
[3] | Chen, Q. and Koltun, V. (2017) Photographic Image Synthesis with Cascaded Refinement Networks. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1520-1529. https://doi.org/10.1109/iccv.2017.168 |
[4] | Dosovitskiy, A. and Brox, T. (2016) Generating Images with Perceptual Similarity Metrics Based on Deep Networks. Advances in Neural Information Processing Systems, Barcelona, 5-10 December 2016, 29. |
[5] | Choi, J., Lee, J., Shin, C., Kim, S., Kim, H. and Yoon, S. (2022). Perception Prioritized Training of Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11462-11471. https://doi.org/10.1109/cvpr52688.2022.01118 |
[6] | Johnson, J., Alahi, A. and Fei-Fei, L. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In: Lecture Notes in Computer Science, Springer, 694-711. https://doi.org/10.1007/978-3-319-46475-6_43 |
[7] | Li, Z., Xia, B., Zhang, J., et al. (2022) A Comprehensive Survey on Data-Efficient GANs in Image Generation. |
[8] | Salimans, T., Goodfellow, I., Zaremba, W., et al. (2016) Improved Techniques for Training Gans. Advances in Neural Information Processing Systems, Barcelona, 5-10 December 2016, 29. |
[9] | Zhu, J., Krähenbühl, P., Shechtman, E. and Efros, A.A. (2016) Generative Visual Manipulation on the Natural Image Manifold. In: Lecture Notes in Computer Science, Springer, 597-613. https://doi.org/10.1007/978-3-319-46454-1_36 |
[10] | Li, J., Liang, X., Wei, Y., Xu, T., Feng, J. and Yan, S. (2017) Perceptual Generative Adversarial Networks for Small Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1951-1959. https://doi.org/10.1109/cvpr.2017.211 |
[11] | Mathieu, M., Couprie, C. and LeCun, Y. (2015) Deep Multi-Scale Video Prediction beyond Mean Square Error. |
[12] | Tulyakov, S., Liu, M., Yang, X. and Kautz, J. (2018). Mocogan: Decomposing Motion and Content for Video Generation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 1526-1535. https://doi.org/10.1109/cvpr.2018.00165 |
[13] | Burt, P.J. And Adelson, E.H. (1987) The Laplacian Pyramid as a Compact Image Code. In: Readings in Computer Vision, Elsevier, 671-679. https://doi.org/10.1016/b978-0-08-051581-6.50065-9 |
[14] | Denton, E.L., Chintala, S. and Fergus, R. (2015) Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks. 2015 Advances in Neural Information Processing Systems, Montreal, 7-12 December 2015, 28. |
[15] | Karacan, L., Akata, Z., Erdem, A., et al. (2016) Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts. |
[16] | Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 4681-4690. |
[17] | Mirza, M. (2014) Conditional Generative Adversarial Nets. |
[18] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[19] | Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965 |
[20] | Durugkar, I., Gemp, I. and Mahadevan, S. (2016) Generative Multi-Adversarial Networks. |
[21] | Iizuka, S., Simo-Serra, E. and Ishikawa, H. (2017) Globally and Locally Consistent Image Completion. ACM Transactions on Graphics, 36, 1-14. https://doi.org/10.1145/3072959.3073659 |
[22] | Garg, K., Singh, A.K., Herremans, D. and Lall, B. (2020) Perceptiongan: Real-World Image Construction from Provided Text through Perceptual Understanding. 2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Kitakyushu, 26-29 August 2020, 1-7. https://doi.org/10.1109/icievicivpr48672.2020.9306618 |
[23] | Rao, A.S., Bhandarkar, P.A., Devanand, P.A., Shankar, P., Shanti, S., et al. (2023) Text to Photo-Realistic Image Synthesis Using Generative Adversarial Networks. 2023 2nd International Conference on Futuristic Technologies (INCOFT), Belagavi, 24-26 November 2023, 1-6. https://doi.org/10.1109/incoft60753.2023.10425482 |
[24] | Yang, L., Zhang, Z., Song, Y., Hong, S., Xu, R., Zhao, Y., et al. (2023) Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Computing Surveys, 56, 1-39. https://doi.org/10.1145/3626235 |
[25] | Graikos, A., Yellapragada, S., Le, M., Kapse, S., Prasanna, P., Saltz, J., et al. (2024) Learned Representation-Guided Diffusion Models for Large-Image Generation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 8532-8542. https://doi.org/10.1109/cvpr52733.2024.00815 |
[26] | Peebles, W. and Xie, S. (2023) Scalable Diffusion Models with Transformers. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 4172-4182. https://doi.org/10.1109/iccv51070.2023.00387 |