%0 Journal Article %T 一种基于聚类的过抽样算法<br>An over sampling algorithm based on clustering %A 王换 %A 周忠眉< %A br> %A WANG Huan %A ZHOU Zhongmei %J 山东大学学报(工学版) %D 2018 %R 10.6040/j.issn.1672-3961.0.2017.416 %X 摘要: 在过抽样技术研究中,为了合成较有意义的新样本,提出一种基于聚类的过抽样算法ClusteredSMOTE-Boost。过滤小类的噪声样本,将剩余的每个小类样本作为目标样本参与合成新样本。对整个训练集聚类,根据聚类后目标样本所在簇的特点确定其权重及合成个数。将所有目标样本聚类,在目标样本所在的簇内选取K个近邻,并从中任选一个与目标样本合成新样本,使新样本与目标样本簇内的样本尽量相似,并减少由于添加样本而造成的边界复杂度。试验结果表明,ClusteredSMOTE-Boost算法在各个度量上均明显优于SMOTE-Boost、ADASYN-Boost和BorderlineSMOTE-Boost三种经典算法。<br>Abstract: In the research of over sampling, in order to generate meaningful new samples, the ClusteredSMOTE-Boost was proposed, which was based on the clustering technique. The algorithm filtered the noisy of minority class samples and took the remaining minority class samples as target samples to synthesize new samples. According to characteristics of the cluster of target samples after clustering determined the weight and the number of the target samples for the whole training set. All target samples were clustered and K-nearest neighbors in the cluster of the target sample were selected, and then a sample from K-nearest neighbors was randomly chosen to synthesize new sample with target sample. Thus, new samples were similar with samples in the target cluster. This method reduced the complexity of the boundary caused by the additional new samples. The experimental results showed that the ClusteredSMOTE-Boost algorithm was superior to the three classical algorithms SMOTE-Boost, ADASYN-Boost, BorderlineSMOTE-Boost on the variety of measures %K 过抽样 %K 样本权重 %K 聚类 %K 分类 %K 不平衡数据 %K < %K br> %K over sampling %K instance weights %K classification %K cluster %K imbalanced data %U http://gxbwk.njournal.sdu.edu.cn/CN/10.6040/j.issn.1672-3961.0.2017.416