全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

基于MapReduce和上采样的两类非平衡大数据集成分类
Binary Ensemble Classification for Imbalanced Big Data Based on MapRecuce and Upper Sampling

DOI: 10.16337/j.1004-9037.2018.03.004

Keywords: 大数据,非平衡分类,上采样,最近邻
big data
,imbalanced classification,upper sampling,nearest neighbor

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出了一种基于MapReduce和上采样的两类非平衡大数据分类方法,该方法分为5步:(1)对于每一个正类样例,用MapReduce寻找其异类最近临;(2)在两个样例点之间的直线上生成若干个正类样例;(3)以新的正类样例子集的大小为基准,将负类样例随机划分为若干子集;(4)用负类样例子集和正类样例子集构造若干个平衡数据子集;(5)用平衡数据子集训练若干个分类器,并对训练好的分类器进行集成。在5个两类非平衡大数据集上与3种相关方法进行了实验比较,实验结果表明本文提出的优于这3种方法。
Based on MapReduce and upper sampling, an approach for imbalanced big data classification is proposed in this paper. The proposed method includes five steps:(1) For each positive instance, its nearest neighbor is found by MapReduce. (2) Some positive instances on the line between the two points are created. (3) According to the cardinality of the set of positive instances, the set of negative instances is partitioned into some subsets. (4) Some balanced subsets are generated with the set of positive instances and the subset of negative instances. (5) Some classifiers are trained by extreme learning machine on the generated balanced subsets, and the trained classifiers are integrated by majority voting for classifying new instances. Experimental comparisons with three related methods are conducted on five imbalanced big data sets. The experimental results show that the proposed method outperforms the three methods.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133