全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

一种基于不平衡数据的聚类抽样方法

DOI: 10.13232/j.cnki.jnju.2015.02.029, PP. 421-429

Keywords: 机器学习,不平衡数据,集成学习,欠抽样tp391.9d文献标识码:a

Full-Text   Cite this paper   Add to My Lib

Abstract:

许多研究表明传统分类器在对海量不平衡数据分类时偏向多数类规则,因此,会导致少数类实例被错误判断为多数类。针对上述问题,提出了一种基于分解求解的学习分类算法。算法先对样本数据进行聚类,在聚类的基础上多次根据权值对数据集进行欠抽样,产生平衡的数据集,对每个平衡数据集进行验证同时提高误判样本的权值。综合考虑每个基分类器的错误率作为分类器的权值,选择分类效果较好的基分类器进行加权集成。实验表明算法有较高的少数类正确率以及少数类f度量,同时可以大幅减少训练集数量。

References

[1]  chanpk,stolfosj.towardscalablelearningwithnon-uniformclassandcostdistributions:acasestudyincreditcardfrauddetection.in:proceedingsofthe4thinternationalconferenceonknowledgediscoveryanddatamining,menlopark:aaaipress,2001:164~168.
[2]  choew,ersoyok,binam.neuralnetworkschemesfordetectingrareeventsinhumangenomicdna.bioinformatics,2000,16(12):1062~1072.
[3]  李雄飞,李军,董元方等.一种新的不平衡数据学习算法boost.计算机学报,2012,35(2):202~209.
[4]  macqueenjb.somemethodsforclassificationandanalysisofmultivariateobservations.in:lecamlm,neymanj.proceedingsofthe5thberkeleysymposiumonmathematicalstatisticsandprobability.berkeley:universityofcaliforniapress,1967,1:281~297.
[5]  maloofma.learningwhendatasetsareimbalancedandwhencostsareunequalandunknown.in:icml-2003workshoponlearningfromimbalanceddatasetsii.washingtondc:aaaipress,2003.154~160
[6]  bartlettpl,traskinm.adaboostisconsistent[j].journalofmachinelearningresearch,2007.2347~2368.
[7]  thammasirid,meesadp.adaboostensembledataclassificationbasedondiversityofclassifiers.advancedmaterialsresearch,2012,403-408:3682~3687.
[8]  yensj,leeys.cluster-basedunder-samplingapproachesforimbalanceddatadistributions.expertsystemswithapplications,2009:5718~5727.
[9]  李晓翠,孟凡荣,周勇.一种基于代表点的快速聚类算法.南京大学学报(自然科学),2012,48(4):504~512.
[10]  krawczykb,wozniakm,haefergs.cost-sensitivedecisiontreeensemblesforeffectiveimbalancedclassification.appliedsoftcomputing,2014:554~562
[11]  kubatm,holter,matwins.machinelearningforthedetectionofoilspillsinsatelliteradarimages.machinelearning,1998,30(2/3):195~215.
[12]  plantcb,ohmc,bernhardt,etal.enhancinginstance-basedclassificationwithlocaldensity:anewalgorithmforclassifyingunbalancedbiomedicaldata.bioinformatics,2006,22(8):981~988.
[13]  dietterichtg.machinelearningresearch:fourcurrentdirections.aimagazine,1997,18(4):97~136.
[14]  freundy.boostingaweakalgorithmbymajority.informationandcomputation,1995,121(2):256~285.
[15]  weissgm.miningwithrarity:aunifyingframework.acmsigkddexplorations,2004,6(1):7~19.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133