%0 Journal Article
%T 基于采样增强与动态直方图的改进LightGBM算法
An Improved LightGBM Algorithm Based on Sampling Enhancement and Dynamic Histogram
%A 张林
%A 严涛
%J Computer Science and Application
%P 680-689
%@ 2161-881X
%D 2025
%I Hans Publishing
%R 10.12677/csa.2025.155140
%X 梯度提升类算法面临的主要问题是大规模数据下的运算速度问题。本文针对LightGBM中采样仅依赖一阶导数影响精度,以及直方图分箱忽视数据分布特征导致计算冗余,提出了基于牛顿法的梯度单边采样,引入二阶导数提高采样精度,同时设计动态直方图算法,实现分布和标签感知的自适应分箱。在Epsilon和MNIST8M数据集上的实验表明,新方法在提升模型性能的同时,训练时间分别减少了20.7%和9.8%。
Gradient boosting algorithms face computational efficiency challenges when processing large-scale data. In order to improve the limitations in LightGBM: the gradient-based one-side sampling relying solely on first-order derivatives which compromises accuracy, and histogram binning ignoring data distribution characteristics leading to computational redundancy, we propose a Newton-based gradient one-side sampling method incorporating second-order derivatives to enhance precision, along with a dynamic histogram algorithm enabling distribution-aware and label-aware adaptive binning. Experimental results on the Epsilon and MNIST8M datasets demonstrate that our approach improves model performance while reducing training time by 20.7% and 9.8% respectively.
%K LightGBM算法,
%K 采样方法,
%K 直方图算法
LightGBM Algorithm
%K Sampling Method
%K Histogram Algorithm
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=115524