%0 Journal Article %T 基于XGBoost和蚁群算法的特征选择方法
Feature Selection Method Based on XGBoost and Ant Colony Optimization %A 张凌翱 %J Computer Science and Application %P 883-889 %@ 2161-881X %D 2023 %I Hans Publishing %R 10.12677/CSA.2023.134086 %X 在机器学习领域,处理高维特征数据时通常会面临冗余和不相关的特征问题,因此特征选择成为一个重要的挑战。对于多维度数据,Relief算法作为一种传统的特征选择算法,具有较高的计算效率和较好的稳定性,被大量应用于实际场景,但Relief算法的特征选择结果具有随机性,不同的初始采样会有不同的结果,且对于特征之间存在较强依赖关系的数据集,如共线性等,可能会导致结果不准确。本文提出了一种特征选择方法,称为X-ACO方法,它结合了XGBoost和蚁群算法。本文算法蚁群路径搜索过程的启发式信息使用XGBoost算法的特征重要性来表示。同时,使用特征之间的皮尔森相关系数来调整信息素浓度,以便更好地控制特征的相关性。实验证明,X-ACO方法可以在保证分类准确率的前提下,减少特征数量,降低特征冗余,并提高算法性能。
In the field of machine learning, the problem of redundant and irrelevant features is usually faced when dealing with high-dimensional feature data, so feature selection becomes an important chal-lenge. For high-dimensional data, Relief algorithm, as a commonly used feature selection algorithm, has high computational efficiency and good stability, and is heavily used in practical scenarios, but the feature selection results of Relief algorithm have randomness, different initial sampling will have different results, and it may lead to inaccurate results for data sets with strong dependencies between features, such as covariance. In this paper, we propose a feature selection method, called X-ACO method, which combines XGBoost and ant colony optimization. The method uses the feature importance of the XGBoost algorithm as heuristic information for the ant colony path search process of the algorithm in this paper. Meanwhile, the Pearson correlation coefficient between features is used to adjust the pheromone concentration in order to better control the relevance of features. Experiments demonstrate that the X-ACO method can reduce the number of features, reduce feature redundancy, and improve the algorithm performance while ensuring classification accuracy. %K 特征选择,XGBoost,蚁群算法,皮尔森系数
Feature Selection %K XGBoost %K Ant Colony Optimization (ACO) %K Pearson Coefficient %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=64728