|
一种新型的分数阶梯度下降法在深度神经网络中的应用
|
Abstract:
文章基于Caputo分数阶微积分,提出了一种新型的适用于神经网络模型训练的分数阶梯度下降法。该算法通过改变积分区间下界,成功将分数阶阶次拓展到了(0, 2)区间,增加了阶次的选择范围,同时,本文基于梯度裁剪机制,从遗憾函数的角度证明了该算法的收敛性,保证了算法的理论可行性。最后,基于CIFAR-10公开数据集的数值实验表明,在选择了合适的阶次的情况下,本文所提出的算法相比于传统的整数阶梯度法,能够获得更快的收敛速度和更高的收敛精度。
This study introduces a novel fractional gradient descent algorithm based on Caputo fractional calculus which is tailored for training neural network models. By adjusting the lower limit of the integral interval, the proposed algorithm extends the fractional order to the (0, 2) range, thereby enhancing the choices of fractional order. Concurrently, this work proves the convergence of the proposed algorithm in detail from the perspective of the regret function based on the gradient clipping mechanism, affirming its theoretical validity. Finally, the numerical experiment based on the publicly available CIFAR-10 dataset, reveals that the proposed algorithm outperforms conventional integer-order gradient method in terms of both convergence speed and convergence accuracy when operated at an optimal order.
[1] | LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W. and Jackel, L. (1989) Handwritten Digit Recognition with a Back-Propagation Network. Advances in Neural Information Processing Systems, 2, 396-404. |
[2] | Jordan, M.I. (1997) Serial Order: A Parallel Distributed Processing Approach. In Advances in Psychology, 121, 471-495. |
[3] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 30, 6000-6010. |
[4] | He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90 |
[5] | Tan, Y., He, Z. and Tian, B. (2015) A Novel Generalization of Modified LMS Algorithm to Fractional Order. IEEE Signal Processing Letters, 22, 1244-1248. https://doi.org/10.1109/lsp.2015.2394301 |
[6] | Khan, S., Naseem, I., Malik, M.A., Togneri, R. and Bennamoun, M. (2018) A Fractional Gradient Descent-Based RBF Neural Network. Circuits, Systems, and Signal Processing, 37, 5311-5332. https://doi.org/10.1007/s00034-018-0835-3 |
[7] | Zhou, X., Zhao, C. and Huang, Y. (2023) A Deep Learning Optimizer Based on Grünwald—Letnikov Fractional Order Definition. Mathematics, 11, Article 316. https://doi.org/10.3390/math11020316 |
[8] | Chaudhary, N.I., Raja, M.A.Z., Khan, Z.A., Mehmood, A. and Shah, S.M. (2022) Design of Fractional Hierarchical Gradient Descent Algorithm for Parameter Estimation of Nonlinear Control Autoregressive Systems. Chaos, Solitons & Fractals, 157, Article ID: 111913. https://doi.org/10.1016/j.chaos.2022.111913 |
[9] | Liu, J., Zhai, R., Liu, Y., Li, W., Wang, B. and Huang, L. (2021) A Quasi Fractional Order Gradient Descent Method with Adaptive Stepsize and Its Application in System Identification. Applied Mathematics and Computation, 393, 125797. https://doi.org/10.1016/j.amc.2020.125797 |
[10] | Pu, Y., Zhou, J., Zhang, Y., Zhang, N., Huang, G. and Siarry, P. (2015) Fractional Extreme Value Adaptive Training Method: Fractional Steepest Descent Approach. IEEE Transactions on Neural Networks and Learning Systems, 26, 653-662. https://doi.org/10.1109/tnnls.2013.2286175 |
[11] | Chen, Y., Gao, Q., Wei, Y. and Wang, Y. (2017) Study on Fractional Order Gradient Methods. Applied Mathematics and Computation, 314, 310-321. |
[12] | Wei, Y., Kang, Y., Yin, W. and Wang, Y. (2020) Generalization of the Gradient Method with Fractional Order Gradient Direction. Journal of the Franklin Institute, 357, 2514-2532. |
[13] | Zhang, H., Pu, Y., Xie, X., Zhang, B., Wang, J. and Huang, T. (2021) A Global Neural Network Learning Machine: Coupled Integer and Fractional Calculus Operator with an Adaptive Learning Scheme. Neural Networks, 143, 386-399. https://doi.org/10.1016/j.neunet.2021.06.021 |
[14] | Cesa-Bianchi, N., Lugosi, G. and Stoltz, G. (2006) Regret Minimization under Partial Monitoring. Mathematics of Operations Research, 31, 562-580. https://doi.org/10.1287/moor.1060.0206 |