|
基于Wasserstein距离作为GAN的优化目标提高其训练稳定性的理论研究
|
Abstract:
生成对抗网络(Generative adversarial Nets,以下简称GANs)因其在图像生成等领域的成功应用而备受关注。然而,其训练的不稳定性一直是一个难以解决的问题,训练过程常常受到模式崩溃、梯度消失和优化不稳定的困扰。一般提高GANs训练稳定性的方法有替代损失函数、梯度惩罚、谱归一化、批量归一化和架构改进等方法。但是这些研究大多缺乏理论基础,未给出相对完善的理论证明,本论文的目标是深入理解基于Wasserstein距离训练GANs的不稳定性,提供较为完整的理论证明。并探讨了进一步改进WGAN训练稳定性的策略,如梯度惩罚(WGAN-GP),以提高WGAN训练的稳定性和泛化能力。本文的主要研究内容如下:第一部分:分析了WGAN通过最小化Wasserstein距离(简称W距离)代替传统的Jensen-divergence (简称JS散度),避免了梯度消失问题。其关键优势在于采用了1-Lipschitz连续的判别器,确保了在训练过程中生成器能够从判别器获得有效梯度。其次,证明了W距离相较于其他距离或者散度对于概率分布序列具有良好的连续性和收敛性。第二部分:通过引入W距离替代原来两个分布之间的JS散度,从理论上改善了GANs训练的稳定性。然而,WGAN的实现仍面临挑战,如权重裁剪导致的容量利用不足和梯度消失问题。为此,基于W距离,Gulrajani等人提出了梯度惩罚(WGAN-GP)来满足Lipschitz约束,以进一步提高训练稳定性。但是大多文献直接给出梯度惩罚常数为1,并未给出具体证明,在本文中给出了证明。
Generative adversarial networks (GANs) have attracted much attention due to their successful applications in fields such as image generation. However, the instability of their training has always been a difficult problem to solve, and the training process is often plagued by mode collapse, gradient vanishing, and optimization instability. General methods to improve the stability of GANs training include alternative loss functions, gradient penalties, spectral normalization, batch normalization, and architectural improvements. However, most of these studies lack a theoretical basis and do not provide a relatively complete theoretical proof. The goal of this paper is to deeply understand the instability of GANs training based on Wasserstein distance and provide a relatively complete theoretical proof. It also explores strategies to further improve the stability of WGAN training, such as gradient penalty (WGAN-GP), to improve the stability and generalization ability of WGAN training. The main research contents of this paper are as follows: Part I: WGAN is analyzed to avoid the gradient vanishing problem by minimizing the Wasserstein distance (W distance for short) instead of the traditional Jensen-divergence (JS divergence for short). Its key advantage is the use of 1-Lipschitz continuous discriminator, which ensures that the generator can obtain effective gradients from the discriminator during training. Secondly, it is proved that W distance has good continuity and convergence for probability distribution sequences compared with other distances or divergences. Part II: By introducing W distance to replace the JS divergence between the original two distributions, the stability of GANs training is theoretically improved. However, the
[1] | Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Advance in Neural Information Processing Systems, 27, 2672-2680. |
[2] | Karras, T., Aila, T., Laine, S. and Lehtinen, J. (2017) Progressive Growing of GANs for Improved Quality, Stability, and Variation. Proceeding of the Advance in Neural Information Processing Systems, Long Beach, 4-9 December 2017, 1-26. |
[3] | Brock, A., Donahue, J. and Simonyan, K. (2018) Large Scale GAN Training for High Fidelity Natural Image Synthesis. Proceeding of the Advance in Neural Information Processing Systems, Montréal, 3-8 December 2018, 1-35. |
[4] | Arjovsky, M. and Bottou, L. (2017) Towards Principled Methods for Training Generative Adversarial Networks. International Conference on Learning Representations, Toulon, 24-26 April 2017, 1-10. |
[5] | Arjovsky, M., Chintala, S. and Bottou, L. (2017) Wassertein Generative Adversarial Networks. International Conference on Machine Learning, Sydney, 6-11 August 2017, 214-223. |
[6] | Mescheder, L., Geiger, A. and Nowozin, S. (2017) Which Training Methods for GANs Do Actually Converge. Proceeding of the Advance in Neural Information Processing Systems, Long Beach, 4-9 December 2017, 3481-3490. |
[7] | Gulrajani, I., Ahmed, F., Arjovsky, M., et al. (2017) Improved Training of Wasserstein GANs. Advance in Neural Information Processing Systems, 30, 1-11. |
[8] | Miyato, T., Kataoka, T., Koyama, M. and Yoshida, Y. (2018) Spectral Normalization for Generative Adversarial Networks. Proceeding of the Advance in Neural Information Processing Systems, Montréal, 3-8 December 2018, 1-26. |
[9] | Zhang, H., Goodfellow, I., Metaxas, D. and Odena, A. (2019) Self-Attention Generative Adversarial Networks. Proceedings of the 36th International Conference on Machine Learning, California, 9-15 June 2019, 7354-7363. |
[10] | Maddison, C.J., Mnih, A. and The, Y.W. (2016) The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. International Conference on Machine Learning, New York, 19-24 June 2016, 2951-2960. |