全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于采样增强与动态直方图的改进LightGBM算法
An Improved LightGBM Algorithm Based on Sampling Enhancement and Dynamic Histogram

DOI: 10.12677/csa.2025.155140, PP. 680-689

Keywords: LightGBM算法,采样方法,直方图算法
LightGBM Algorithm
, Sampling Method, Histogram Algorithm

Full-Text   Cite this paper   Add to My Lib

Abstract:

梯度提升类算法面临的主要问题是大规模数据下的运算速度问题。本文针对LightGBM中采样仅依赖一阶导数影响精度,以及直方图分箱忽视数据分布特征导致计算冗余,提出了基于牛顿法的梯度单边采样,引入二阶导数提高采样精度,同时设计动态直方图算法,实现分布和标签感知的自适应分箱。在Epsilon和MNIST8M数据集上的实验表明,新方法在提升模型性能的同时,训练时间分别减少了20.7%和9.8%。
Gradient boosting algorithms face computational efficiency challenges when processing large-scale data. In order to improve the limitations in LightGBM: the gradient-based one-side sampling relying solely on first-order derivatives which compromises accuracy, and histogram binning ignoring data distribution characteristics leading to computational redundancy, we propose a Newton-based gradient one-side sampling method incorporating second-order derivatives to enhance precision, along with a dynamic histogram algorithm enabling distribution-aware and label-aware adaptive binning. Experimental results on the Epsilon and MNIST8M datasets demonstrate that our approach improves model performance while reducing training time by 20.7% and 9.8% respectively.

References

[1]  Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W. and Liu, T. Y. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30, 3147-3155.
[2]  Friedman, J.H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189-1232.
https://doi.org/10.1214/aos/1013203451
[3]  Ponsam, J.G., Bella Gracia, S.V.J., Geetha, G., Karpaselvi, S. and Nimala, K. (2021) Credit Risk Analysis Using LightGBM and a Comparative Study of Popular Algorithms. 2021 4th International Conference on Computing and Communications Technologies (ICCCT), Chennai, 16-17 December 2021, 634-641.
https://doi.org/10.1109/iccct53315.2021.9711896
[4]  Ge, D., Gu, J., Chang, S. and Cai, J. (2020) Credit Card Fraud Detection Using LightGBM Model. 2020 International Conference on E-Commerce and Internet Technology (ECIT), Zhangjiajie, 22-24 April 2020, 232-236.
https://doi.org/10.1109/ecit50008.2020.00060
[5]  Han, L., Yang, T., Pu, X., Sun, L., Yu, B. and Xi, J. (2021) Alzheimer’s Disease Classification Using LightGBM and Euclidean Distance Map. 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, 12-14 March 2021, 1540-1544.
https://doi.org/10.1109/iaeac50856.2021.9391046
[6]  Alzamzami, F., Hoda, M. and El Saddik, A. (2020) Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation. IEEE Access, 8, 101840-101858.
https://doi.org/10.1109/access.2020.2997330
[7]  Ong, Y.J., Zhou, Y., Baracaldo, N. and Ludwig, H. (2020) Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning.
[8]  Zhang, H., Si, S. and Hsieh, C.J. (2017) GPU-Acceleration for Large-Scale Tree Boosting.
[9]  Meng, Q., Ke, G., Wang, T., Chen, W., Ye, Q., Ma, Z.M. and Liu, T.Y. (2016) A Communication-Efficient Parallel Algorithm for Decision Tree. Advances in Neural Information Processing Systems, 29, 1279-1287.
[10]  Shi, Y., Ke, G., Chen, Z., Zheng, S. and Liu, T. Y. (2022) Quantized Training of Gradient Boosting Decision Trees. Advances in Neural Information Processing Systems, 35, 18822-18833.
[11]  Chen, T. and Guestrin, C. (2016) XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2017, 785-794.
https://doi.org/10.1145/2939672.2939785

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133