|
基于梯度范数感知最小化的去中心化联邦学习算法
|
Abstract:
去中心化联邦学习是通过一组设备执行隐私保护的分布式学习,它有效降低了中心化联邦学习的通信成本和信息泄露风险。然而,设备之间的非独立同分布数据会影响模型效果。为了解决这个问题,几乎所有的算法都利用经验风险最小化作为局部优化器,但这很容易造成客户端本地训练过拟合,造成算法全局模型的泛化能力下降。本文利用梯度范数感知最小化,提出基于梯度范数感知最小化的去中心化联邦学习算法,使全局模型损失函数的表面更加平滑,提升模型的泛化能力。
Decentralized Federated Learning performs privacy-preserving distributed learning across a group of devices, reducing the communication costs and information leakage risks associated with centralized federated learning. However, the non-independent and identically distributed (Non-IID) data among devices can negatively impact the model’s performance. To address this issue, most algorithms adopt empirical risk minimization as the local optimizer, which often leads to overfitting during local client training and results in decreased generalization ability of the global model. This paper proposes a Decentralized Federated Learning Algorithm based on Gradient Norm-Aware Minimization, which smooths the loss surface of the global model and enhances its generalization performance.
[1] | McMahan, B., Moore, E., Ramage, D., Hampson, S. and Arcas, B.A. (2017) Communication-Efficient Learning of Deep Networks from Decentralized Data. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, 20-22 April 2017, 1273-1282. |
[2] | Lalitha, A., Shekhar, S., Javidi, T. And Koushanfar, F. (2018) Fully Decentralized Federated Learning. Third Workshop on Bayesian Deep Learning (NeurIPS), Montréal. https://bayesiandeeplearning.org/2018/papers/140.pdf |
[3] | Sun, T., Li, D. and Wang, B. (2023) Decentralized Federated Averaging. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 4289-4301. https://doi.org/10.1109/tpami.2022.3196503 |
[4] | Cai, X., Yu, N., Zhao, M., Cao, M., Zhang, T. and Lu, J. (2024) Decentralized Federated Learning in Partially Connected Networks with Non-IID Data. 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, 25-27 March 2024, 1-6. https://doi.org/10.23919/date58400.2024.10546508 |
[5] | Liao, Y., Xu, Y., Xu, H., Chen, M., Wang, L. and Qiao, C. (2024) Asynchronous Decentralized Federated Learning for Heterogeneous Devices. IEEE/ACM Transactions on Networking, 32, 4535-4550. https://doi.org/10.1109/tnet.2024.3424444 |
[6] | Shi, Y., Shen, L., Wei, K., Sun, Y., Yuan, B., Wang, X. and Tao, D. (2023) Improving the Model Consistency of Decentralized Federated Learning. International Conference on Machine Learning, Honolulu, 23-29 July 2023, 31269-31291. |
[7] | Zhang, X., Xu, R., Yu, H., Zou, H. and Cui, P. (2023) Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 20247-20257. https://doi.org/10.1109/cvpr52729.2023.01939 |
[8] | Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S. and Suresh, A.T. (2020) Scaffold: Stochastic Controlled Averaging for Federated Learning. 2020 International Conference on Machine Learning, 13-18 July 2020, 5132-5143. |
[9] | Qu, Z., Li, X., Duan, R., Liu, Y., Tang, B. and Lu, Z. (2022) Generalized Federated Learning via Sharpness Aware Mini-mization. 2022 International Conference on Machine Learning, Baltimore, 17-23 July 2022, 18250-18280. |
[10] | Lian, X., Zhang, C., Zhang, H., Hsieh, C. J., Zhang, W. and Liu, J. (2017) Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. Advances in Neural Information Processing Systems, 2017, Long Beach, 4-9 December 2017, 5330-5340. |