|
基于GAN和Transformer的人脸图像超分辨率重建
|
Abstract:
为满足工业领域对低成本、高质量和大批量人脸图像的获取需求。本文提出了一种基于生成对抗网络和Transformer的超分辨率模型。在生成器方面,新设计了一种密集连接的Transformer结构替换了传统的卷积层,以建立全局特征依赖关系,从而提高特征提取能力和图像重建质量。同时,在鉴别器上采用判别能力更强的U-Net结构,以匹配生成器的性能。为了解决以往图像退化泛用性不足的问题,提出了一种图像退化模型,以实时生成训练图像对,大大丰富退化场景和数据集。为了更加细致地呈现人脸特征,所提出的模型还在自制数据集上进一步训练。实验结果表明,本文提出的模型相比其他模型在线条、纹理和清晰度等方面表现更好。
To meet the demand for low-cost, high-quality and high-volume face image acquisition in industry. In this paper, a super-resolution model based on generative adversarial network and Transformer is proposed. For the generator, a new densely connected Transformer structure is designed to replace the traditional convolutional layer to establish global feature dependencies, thus improving the feature extraction capability and image reconstruction quality. Meanwhile, a U-Net structure with stronger discriminative ability is used in the discriminator to match the performance of the generator. To solve the problem of insufficient generalizability of previous image degradation, an image degradation model is proposed to generate training image pairs in real time, which greatly enriches the degradation scenes and datasets. The proposed model is further trained on a homemade dataset in order to present face features in more detail. Experimental results show that the proposed model in this paper performs better in terms of lines, textures and sharpness compared with other models.
[1] | Dong, C., Loy, C.C., He, K. and Tang, X. (2015) Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 295-307.
https://doi.org/10.1109/TPAMI.2015.2439281 |
[2] | Zhang, Y., Li, K., Li, K., et al. (2018) Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, Vol. 11211, Springer, Cham, 286-301.
https://doi.org/10.1007/978-3-030-01234-2_18 |
[3] | Woo, S., Park, J., Lee, JY. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, Vol. 11211, Springer, Cham, 3-19. https://doi.org/10.1007/978-3-030-01234-2_1 |
[4] | Zhang, Y., Tian, Y., Kong, Y., Zhong, B. and Fu, Y. (2018) Residual Dense Network for Image Super-Resolution. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 2472-2481. https://doi.org/10.1109/CVPR.2018.00262 |
[5] | Zhang, K., Zuo, W. and Zhang, L. (2018) Learning a Single Convolutional Super-Resolution Network for Multiple Degradations. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 3262-3271. https://doi.org/10.1109/CVPR.2018.00344 |
[6] | Bell-Kligler, S., Shocher, A. and Irani, M. (2019) Blind Super-Resolution Kernel Estimation Using an Internal-GAN.
https://arxiv.org/abs/1909.06581 |
[7] | Fritsche, M., Gu, S. and Timofte, R. (2019) Frequency Separation for Real-World Super-Resolution. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, 27-28 October 2019, 3599-3608.
https://doi.org/10.1109/ICCVW.2019.00445 |
[8] | Efrat, N., Glasner, D., Apartsin, A., Nadler, B. and Levin, A. (2013) Accurate Blur Models vs. Image Priors in Single Image Super-Resolution. Proceedings of 2013 IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 2832-2839. https://doi.org/10.1109/ICCV.2013.352 |
[9] | Ledig, C., Theis, L., Huszár, F., et al. (2017) Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 105-114. https://doi.org/10.1109/CVPR.2017.19 |
[10] | Wang, X., Yu, K., Wu, S., et al. (2019) ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In: Leal-Taixé, L. and Roth, S., Eds., Computer Vision—ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science, Vol. 11133, Springer, Cham, 63-79. https://doi.org/10.1007/978-3-030-11021-5_5 |
[11] | Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition.
https://arxiv.org/abs/1409.1556 |
[12] | Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. https://arxiv.org/abs/1706.03762 |
[13] | Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. https://arxiv.org/abs/2010.11929 |
[14] | Chen, H., Wang, Y., Guo, T., et al. (2021) Pre-Trained Image Processing Transformer. Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 12294-12305.
https://doi.org/10.1109/CVPR46437.2021.01212 |
[15] | Liu, Z., Lin, Y., Cao, Y., et al. (2021) Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9992-10002. https://doi.org/10.1109/ICCV48922.2021.00986 |
[16] | Liang, J., Cao, J., Sun, G., et al. (2021) Swinir: Image Restoration Using Swin Transformer. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 1833-1844. https://doi.org/10.1109/ICCVW54120.2021.00210 |
[17] | Shi, W., Caballero, J., Huszár, F., et al. (2016) Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 1874-1883. https://doi.org/10.1109/CVPR.2016.207 |
[18] | Sch?nfeld, E., Schiele, B. and Khoreva, A. (2020) A U-Net Based Discriminator for Generative Adversarial Networks. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 8204-8213. https://doi.org/10.1109/CVPR42600.2020.00823 |
[19] | Karras, T., Laine, S. and Aila, T. (2019) A Style-Based Generator Architecture for Generative Adversarial Networks. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 4396-4405. https://doi.org/10.1109/CVPR.2019.00453 |
[20] | Soh, J.W., Park, G.Y., Jo, J. and Cho, N.I. (2019) Natural and Realistic Single Image Super-Resolution with Explicit Natural Manifold Discrimination. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 8114-8123. https://doi.org/10.1109/CVPR.2019.00831 |