全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于XGBoost和蚁群算法的特征选择方法
Feature Selection Method Based on XGBoost and Ant Colony Optimization

DOI: 10.12677/CSA.2023.134086, PP. 883-889

Keywords: 特征选择,XGBoost,蚁群算法,皮尔森系数
Feature Selection
, XGBoost, Ant Colony Optimization (ACO), Pearson Coefficient

Full-Text   Cite this paper   Add to My Lib

Abstract:

在机器学习领域,处理高维特征数据时通常会面临冗余和不相关的特征问题,因此特征选择成为一个重要的挑战。对于多维度数据,Relief算法作为一种传统的特征选择算法,具有较高的计算效率和较好的稳定性,被大量应用于实际场景,但Relief算法的特征选择结果具有随机性,不同的初始采样会有不同的结果,且对于特征之间存在较强依赖关系的数据集,如共线性等,可能会导致结果不准确。本文提出了一种特征选择方法,称为X-ACO方法,它结合了XGBoost和蚁群算法。本文算法蚁群路径搜索过程的启发式信息使用XGBoost算法的特征重要性来表示。同时,使用特征之间的皮尔森相关系数来调整信息素浓度,以便更好地控制特征的相关性。实验证明,X-ACO方法可以在保证分类准确率的前提下,减少特征数量,降低特征冗余,并提高算法性能。
In the field of machine learning, the problem of redundant and irrelevant features is usually faced when dealing with high-dimensional feature data, so feature selection becomes an important chal-lenge. For high-dimensional data, Relief algorithm, as a commonly used feature selection algorithm, has high computational efficiency and good stability, and is heavily used in practical scenarios, but the feature selection results of Relief algorithm have randomness, different initial sampling will have different results, and it may lead to inaccurate results for data sets with strong dependencies between features, such as covariance. In this paper, we propose a feature selection method, called X-ACO method, which combines XGBoost and ant colony optimization. The method uses the feature importance of the XGBoost algorithm as heuristic information for the ant colony path search process of the algorithm in this paper. Meanwhile, the Pearson correlation coefficient between features is used to adjust the pheromone concentration in order to better control the relevance of features. Experiments demonstrate that the X-ACO method can reduce the number of features, reduce feature redundancy, and improve the algorithm performance while ensuring classification accuracy.

References

[1]  孙洁丽, 刘沛, 翟浩文. 基于高维数据的聚类研究综述[J]. 河北省科学院报, 2022, 39(5): 1-6.
[2]  邹丽英, 刘祎. 超高维缺失响应数据的特征筛选[J]. 中国海洋大学学报(自然科学版), 2023, 53(1): 147-156.
[3]  钟彩, 杨亚鑫, 王璟德, 孙巍. 特征筛选对抗肿瘤药物识别的影响研究[J]. 化学研究与应用, 2022, 34(10): 2350-2356.
[4]  罗妍, 王枞, 叶文玲. 基于XGBoost和SHAP的急性肾损伤可解释预测模型[J]. 电子与信息学报, 2022, 44(1): 27-38.
[5]  熊玲珠, 邱伟涵, 罗计根, 李科定. 基于最大信息系数和迭代式XGBoost的混合特征选择方法[J]. 计算机应用与软件, 2023, 40(1): 280-286+305.
[6]  徐久成, 孟祥茹, 瞿康林, 孙元豪, 杨杰. 基于模糊邻域相对依赖互信息的特征选择方法[J]. 模糊系统与数学, 2023, 37(1): 121-135.
[7]  何鹏, 龙文. 一种改进鲸鱼优化算法的特征选择方法[J]. 绿色科技, 2022, 24(18): 246-248+271.
[8]  孙林, 施恩惠, 司珊珊, 徐久成. 基于AP聚类和互信息的弱标记特征选择方法[J]. 南京师大学报(自然科学版), 2022, 45(3): 108-115.
[9]  Abdulhussien, A.A., Nasrudin, M.F., Darwish, S.M. and Abdi, A.A.Z. (2023) Feature Selection Method Based on Quantum Inspired Genetic Algorithm for Arabic Signature Verification. Journal of King Saud University—Computer and Information Sci-ences, 35, 141-156.
https://doi.org/10.1016/j.jksuci.2023.02.005
[10]  刘江, 许康智, 蔡伯根, 郭忠斌, 王剑. 基于XGBoost的列控车载设备故障预测方法[J]. 北京交通大学学报, 2021, 45(4): 95-106.
[11]  Suenaga, D., Takase, Y., Abe, T., Orita, G. and Ando, S. (2023) Prediction Accuracy of Random Forest, XGBoost, LightGBM, and Artificial Neural Network for Shear Resistance of Post-Installed Anchors. Structures, 50, 1252-1263.
https://doi.org/10.1016/j.istruc.2023.02.066
[12]  Ali, S., Khorrami, B., Jehanzaib, M., et al. (2023) Spatial Downscaling of GRACE Data Based on XGBoost Model for Improved Understanding of Hydrological Droughts in the Indus Basin Irrigation System (IBIS). Remote Sensing, 15, Article 873.
https://doi.org/10.3390/rs15040873
[13]  Ren, Q.X. and Wang, J.G. (2023) Research on Enterprise Digital-Level Classification Based on XGBoost Model. Sustainability, 15, Article 2699.
https://doi.org/10.3390/su15032699
[14]  Colorni, A., Dorigo, M. and Maniezzo, V. (1991) Distributed Optimiza-tion by Ant Colonies. Proceedings of the First European Conference on Artificial Life, Vol. 142, 134-142.
[15]  郭城成, 田立勤, 武文星. 蚁群算法在求解旅行商问题中的应用综述[J]. 计算机系统应用, 2023, 32(3): 1-14.
[16]  郭琴, 郑巧仙. 基于优化蚁群算法的机器人路径规划[J]. 湖北大学学报(自然科学版), 2023, 45(2): 157-163.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133