%0 Journal Article %T 基于采样增强与动态直方图的改进LightGBM算法
An Improved LightGBM Algorithm Based on Sampling Enhancement and Dynamic Histogram %A 张林 %A 严涛 %J Computer Science and Application %P 680-689 %@ 2161-881X %D 2025 %I Hans Publishing %R 10.12677/csa.2025.155140 %X 梯度提升类算法面临的主要问题是大规模数据下的运算速度问题。本文针对LightGBM中采样仅依赖一阶导数影响精度,以及直方图分箱忽视数据分布特征导致计算冗余,提出了基于牛顿法的梯度单边采样,引入二阶导数提高采样精度,同时设计动态直方图算法,实现分布和标签感知的自适应分箱。在Epsilon和MNIST8M数据集上的实验表明,新方法在提升模型性能的同时,训练时间分别减少了20.7%和9.8%。
Gradient boosting algorithms face computational efficiency challenges when processing large-scale data. In order to improve the limitations in LightGBM: the gradient-based one-side sampling relying solely on first-order derivatives which compromises accuracy, and histogram binning ignoring data distribution characteristics leading to computational redundancy, we propose a Newton-based gradient one-side sampling method incorporating second-order derivatives to enhance precision, along with a dynamic histogram algorithm enabling distribution-aware and label-aware adaptive binning. Experimental results on the Epsilon and MNIST8M datasets demonstrate that our approach improves model performance while reducing training time by 20.7% and 9.8% respectively. %K LightGBM算法, %K 采样方法, %K 直方图算法
LightGBM Algorithm %K Sampling Method %K Histogram Algorithm %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=115524