|
- 2018
基于样本邻域保持的代价敏感特征选择
|
Abstract:
特征选择是机器学习和数据挖据中一个重要的预处理步骤,而类别不均衡数据的特征选择是机器学习和模式识别中的一个热点研究问题。多数传统的特征选择分类算法追求高精度,并假设数据没有误分类代价或者有同样的代价。在现实应用中,不同的误分类往往会产生不同的误分类代价。为了得到最小误分类代价下的特征子集,本文提出一种基于样本邻域保持的代价敏感特征选择算法。该算法的核心思想是把样本邻域引入现有的代价敏感特征选择框架。在8个真实数据集上的实验结果表明了该算法的优越性。
Feature selection is an important preprocessing step in machine learning and data mining. Feature selection of class-imbalanced dataset is a hot topic of machine learning and pattern recognition. Most traditional feature selection classification algorithms pursue high precision, and assume that the data have no misclassification costs or have the same costs. However, in real applications, different misclassifications always tend to produce different misclassification costs. To get the feature subset with minimum misclassification cost, a supervised cost-sensitive feature selection algorithm based on sample neighborhood preserving is proposed, whose main idea is to introduce the sample neighborhood into the cost-sensitive feature selection framework. The experimental results on eight real-life data sets demonstrate the superiority of the proposed algorithm.