|
一种自适应选取步长的随机交替方向乘子法
|
Abstract:
本文研究了具有可分离变量的凸随机优化问题,提出了一种新的随机交替方向乘子(ADMM)算法。该算法是ADMM与自适应选取步长的随机缩减梯度算法(SVRG-BB)的结合,利用BB步长实现了SVRG-ADMM方法自适应选取步长,而无需再使用递减步长或者手动调节步长。在一般的假设条件下,证明了算法的收敛性。 最后给出相关数值实验表明了算法的有效性。
This paper considers the problem of convex stochastic optimization with separable variables. We propose a stochastic alternating direction method of multipliers (AD- MM) algorithm to solve this convex stochastic optimization problem. The algorithm can be roughly regarded as a combination of ADMM and adaptive step stochastic reduced gradient algorithm (SVRG-BB). BB step size is used to realize the adaptive step size selection by SVRG-ADMM method, without decreasing step size or manu- ally adjusting step size. Under general assumptions, the convergence of the algorithm is proved. Finally, numerical experiments are given to show the effectiveness of the algorithm.
[1] | Che, M.L., Qi, L.Q. and Wei, Y.M. (2015) Positive-Definite Tensors to Nonlinear Complemen- tarity Problems. Journal of Optimization Theory and Applications, 168, 475-487. |
[2] | https://doi.org/10.1007/s10957-015-0773-1 |
[3] | Schmidt, M.W., Le Roux, N. and Bach, F.R. (2013) Minimizing Finite Sums with the Stochas- tic Average Gradient. Mathematical Programming, 162, 83-112. https://doi.org/10.1007/s10107-016-1030-6 |
[4] | Mairal, J., Bach, F.R., Ponce, J. and Sapiro, G. (2009) Online Dictionary Learning for S- parse Coding. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 14-18 June 2009, 689-696. https://doi.org/10.1145/1553374.1553463 |
[5] | Robbins, H.E. (1951) A Stochastic Approximation Method. Annals of Mathematical Statistics, 22, 400-407. https://doi.org/10.1214/aoms/1177729586 |
[6] | Bottou, L., Curtis, F.E. and Nocedal, J. (2016) Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60, 223-311. https://doi.org/10.1137/16M1080173 |
[7] | Shalev-Shwartz, S. and Zhang, T. (2013) Stochastic Dual Coordinate Ascent Methods for Regularized Loss. Journal of Machine Learning Research, 14, 567-599. |
[8] | Le Roux, N., Schmidt, M.W. and Bach, F.R. (2012) A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets. NIPS. |
[9] | Defazio, A., Bach, F.R. and Lacoste-Julien, S. (2014) SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. NIPS. |
[10] | Johnson, R. and Zhang, T. (2013) Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. NIPS. |
[11] | Barzilai, J. and Borwein, J.M. (1988) Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8, 141-148. https://doi.org/10.1093/imanum/8.1.141 |
[12] | Tan, C.H., Ma, S.Q., Dai, Y.H. and Qian, Y.Q. (2016) Barzilai-Borwein Step Size for Stochastic Gradient Descent. NIPS. |
[13] | Yu, T.T., Liu, X.W., Dai, Y.H. and Sun, J. (2021) Stochastic Variance Reduced Gradient Methods Using a Trust-Region-Like Scheme. Journal of Scientific Computing, 87, Article No. 5. https://doi.org/10.1007/s10915-020-01402-x |
[14] | Boyd, S.P., Parikh, N., Chu, E.K.-W., Peleato, B. and Eckstein, J. (2011) Distributed Opti- mization and Statistical Learning via the Alternating Direction Method of Multipliers. Foun- dations and Trends in Machine Learning, 3, 1-122. https://doi.org/10.1561/2200000016 |
[15] | Suzuki, T. (2013) Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method. Proceedings of the 30th International Conference on Machine Learning, 28, 392-400. https://dl.acm.org/doi/10.5555/3042817.3042863 |
[16] | Ouyang, H., He, N., Tran, L. and Gray, A.G. (2013) Stochastic Alternating Direction Method of Multipliers. Proceedings of the 30th International Conference on Machine Learning, 28, 80-88. https://dl.acm.org/doi/10.5555/3042817.3042828 |
[17] | Suzuki, T. (2014) Stochastic Dual Coordinate Ascent with Alternating Direction Method of Multipliers. Proceedings of the 31st International Conference on Machine Learning, 32, 736- 744. https://dl.acm.org/doi/10.5555/3044805.3044889 |
[18] | Zhong, W.L. and Kwok, J. (2013) Fast Stochastic Alternating Direction Method of Multipliers. Proceedings of the 31st International Conference on Machine Learning, 32, 46-54. |
[19] | Zheng, S. and Kwok, J.T.-Y. (2016) Fast-and-Light Stochastic ADMM. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2407-2413. https://dl.acm.org/doi/10.5555/3060832.3060958 |
[20] | Liu, Y.Y., Shang, F.H. and Cheng, J. (2017) Accelerated Variance Reduced Stochastic ADMM. Proceedings of the 31th AAAI Conference on Artificial Intelligence, San Francisco, California, 2287-2293. https://dl.acm.org/doi/10.5555/3298483.3298569 |
[21] | [20] Liu, Y.Y., Shang, F.H., Liu, H.Y., Kong, L., Jiao, L.C. and Lin, Z.C. (2020) Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 4242-4255. https://doi.org/10.1109/TPAMI.2020.3000512 |
[22] | Zhang, X.Q., Burger, M. and Osher, S. (2010) A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration. Journal of Scientific Computing, 46, 20-46. https://doi.org/10.1007/s10915-010-9408-8 |
[23] | Xiao, L. and Zhang, T. (2014) A Proximal Stochastic Gradient Method with Progressive Variance Reduction. SIAM Journal on Optimization, 24, 2057-2075. https://doi.org/10.1137/140961791 |