全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

不平衡数据学习算法在相似性学习中的应用*

, PP. 1138-1146

Keywords: 相似性学习,支持向量机,K近邻,不平衡学习,重采样

Full-Text   Cite this paper   Add to My Lib

Abstract:

在现实问题中,相似性学习的样本对存在不平衡现象,即相似性样本对的数量会远小于不相似性样本对的数量.针对此问题,文中提出两种样本对构造方法——不相似K近邻-相似K近邻(DKNN-SKNN)和不相似K近邻-相似K远邻(DKNN-SKFN).运用这两种方法可有针对性地选择相似性学习样本对,不仅可加快支持向量机的训练过程,而且在一定程度上解决样本对之间的不平衡问题.在多个数据集上进行文中方法和经典的重采样方法的对比实验,结果表明DKNN-SKNN和DKNN-SKFN具有良好性能.

References

[1]  Cover T M, Hart P E. Nearest Neighbor Pattern Classification. IEEE Trans on Information Theory, 1967, 13(1): 21-27
[2]  Burges C J C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 1998, 2(2):121-167
[3]  Cristianini N, Shawe-Taylor J. Support Vector Machines. Cambridge, UK: Cambridge University Press, 2000
[4]  Zhang L. Research on Support Vector Machines and Kernel Me-thods. Ph.D Dissertation. Xi′an, China: Xidian University, 2009 (in Chinese)(张 莉.支撑矢量机与核方法研究.博士学位论文.西安:西安电子科技大学, 2009)
[5]  Chawla N V, Japkowicz N, Kolcz A. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1-6
[6]  Weiss G M, Provost F. The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report, ML-TR-43. New Brunswick, USA: Rutgers University, 2001
[7]  Laurikkala J. Improving Identification of Difficult Small Classes by Balancing Class Distribution // Proc of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine. Cascais, Portugal, 2001: 63-66
[8]  Estabrooks A, Jo T, Japkowicz N. A Multiple Resampling Method for Learning from Imbalanced Data Sets. Computational Intelligence, 2004, 20(1): 18-36
[9]  Weiss G M. Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 7-19
[10]  Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection // Proc of the 14th International Confe-rence on Machine Learning. Nashville, USA, 1997: 179-186
[11]  Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 2002, 16: 321-357
[12]  Han H, Wang W Y, Mao B H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning // Proc of the International Conference on Intelligent Computing. Hefei, China, 2005, I: 878-887
[13]  Mease D, Wyner A J, Buja A. Boosted Classification Trees and Class Probability/Quantile Estimation. Journal of Machine Lear-ning Research, 2007, 8(3): 409-439
[14]  Zhang J, Mani I. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction // Proc of the International Conference on Machine Learning: Workshop on Learning from Imbalanced Datasets. Washington, USA, 2003: 42-48
[15]  Joshi M V, Kumar V, Agarwal R C. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements // Proc of the IEEE International Conference on Data Mining. San Jose, USA, 2001: 257-264
[16]  Wu G, Chang E Y. Class-Boundary Alignment for Imbalanced Dataset Learning // Proc of the International Conference on Machine Learning: Workshop on Learning from Imbalanced Datasets. Washington, USA, 2003: 49-56
[17]  Raskutti B, Kowalczyk A. Extreme Re-balancing for SVMs: A Case Study. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 60-69
[18]  Schlkopf B, Platt J C, Shawe-Taylor J, et al. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 2001, 13(7): 1443-1471
[19]  Manevitz L M, Yousef M. One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2001, 2: 139-154
[20]  Zhuang L, Dai H H. Parameter Estimation of One-Class SVM on Imbalance Text Classification // Proc of the 19th Conference of the Canadian Society for Computational Studies of Intelligence. Quebec City, Canada, 2006: 538-549
[21]  Lee H J, Cho S. The Novelty Detection Approach for Different Degrees of Class Imbalance // Proc of the 13th International Conference on Neural Information Processing. Hong Kong, China, 2006, II: 21-30
[22]  Zhuang L, Dai H H. Parameter Optimization of Kernel-Based One-Class Classifier on Imbalance Text Learning // Proc of the 9th Pacific Rim International Conference on Artificial Intelligence. Gui-lin, China, 2006: 434-443
[23]  Japkowicz N. Supervised versus Unsupervised Binary-Learning by Feedforward Neural Networks. Machine Learning, 2001, 42(1/2): 97-122
[24]  Manevitz L, Yousef M. One-Class Document Classification via Neural Networks. Neurocomputing, 2007, 70(7/8/9): 1466-1481
[25]  Japkowicz N. Learning from Imbalanced Data Sets: A Comparison of Various Strategies[EB/OL]. [2012-06-30]. http://sci2s.ugr.es/keel/pdf/specific/congreso/aaai2000-workshop.pdf
[26]  Japkowicz N, Myers C, Gluck M. A Novelty Detection Approach to Classification // Proc of the 14th International Joint Conferences on Artificial Intelligence. Montreal, Canada, 1995, I: 518-523
[27]  Phillips P J. Support Vector Machines Applied to Face Recognition [EB/OL]. [2012-06-30]. http://papers.nips.cc/paper/1609-support-vector-machines-applied-to-face-recognition.pdf
[28]  Melacci S, Sarti L, Maggini M, et al. A Neural Network Approach to Similarity Learning // Proc of the 3rd IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Paris, France, 2008: 133-136
[29]  Wright J, Yang A Y, Ganesh A, et al. Robust Face Recognition via Sparse Representation. IEEE Trans on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227
[30]  Zhang L, Zhou W D, Chang P C, et al. Kernel Sparse Representation-Based Classifier. IEEE Trans on Signal Processing, 2012, 60(4): 1684-1695

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133