|
- 2018
基于最大平衡度的自适应随机抽样算法DOI: 10.12068/j.issn.1005-3026.2018.06.007 Keywords: 非平衡数据集, 最大平衡度, 随机抽样, 随机森林, 数据预处理Key words: imbalanced dataset balance maximization random sampling random forest data preprocessing Abstract: 摘要 针对分类算法在非平衡数据集的情况下分类性能不理想的问题,总结了常见的数据平衡化方法,包括改造数据集与改进算法,提出一种全新的基于最大平衡度的自适应随机抽样算法,进一步优化了随机森林算法的分类效果.将其应用在随机森林算法的数据预处理阶段,并通过实验证明了该随机抽样方法的有效性,在合理的整体精度范围内能够较好地处理非平衡数据.产生的新数据比较拟合初始数据,能够提高分类器处理非平衡数据的能力.Abstract:The problem on the classification algorithm of imbalanced datasets was analyzed. Common methods of balancing data, including improvement of datasets and the improved algorithm, were summarized. Then a novel algorithm called adaptive random sampling algorithm was put forward based on balance maximization. The classification effect of random forest algorithm was further optimized. Experiments show that the proposed algorithm performs well with the imbalanced data, the new data are fitted with the original data, and it could improve the ability of classifier to deal with the imbalanced data.
|