全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

一种选择特征的稀疏在线学习算法
A sparse online learning algorithm for feature selection

DOI: 10.6040/j.issn.1672-3961.1.2016.060

Keywords: 大数据,机器学习,在线学习,稀疏性,L1准则,
L1 norm
,big data,machine learning,online learning,sparsity

Full-Text   Cite this paper   Add to My Lib

Abstract:

摘要: 为了有效处理海量、高维、稀疏的大数据,提高对数据的分类效率,提出一种基于L1准则稀疏性原理的在线学习算法(a sparse online learning algorithm for selection feature, SFSOL)。运用在线机器学习算法框架,对高维流式数据的特征进行新颖的“取整”处理,加大数据特征稀疏性的同时增强了阀值范围内部分特征的值,极大地提高了对稀疏数据分类的效果。利用公开的数据集对SFSOL算法的性能进行分析,并将该算法与其它3种稀疏在线学习算法的性能进行比较,试验结果表明提出的SFSOL算法对高维稀疏数据分类的准确性更高。
Abstract: In order to effectively deal with mass, high dimensional and sparse big data and improve the efficiency of data classification, an online learning algorithm based on the sparsity principle of L1 norm was proposed. The feature of high dimensional streaming data were novel “Integer” processed by using the online machine learning algorithm framework increased the sparsity of data feature, meanwhile enhanced the partial feature value within the scope of the threshold value and greatly improved the effect of sparse data classification. The performance of SFSOL algorithm was analyzed by using public data sets. The algorithm and the performance of the other three sparse online learning algorithms were compared. The experimental results showed that SFSOL algorithm was more suitable to accurately classify for high-dimensional sparse data

References

[1]  ZHAO Z, LIU H. Spectral feature selection for supervised and unsupervised learning[C] // Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 1151-1157.
[2]  WANG Jialei, ZHAO Peilin, HOI S C H. Exact soft confidence-weighted learning[C] // Proceedings of the 29th International Conference on Machine Learning. Scotland, Braitain: Edinburgh, 2012: 1-8.
[3]  DASH M, GOPALKRISHNAN V. Distance based feature selection for clustering microarray data[C] // International Conference on Database Systems for Advanced Applications. New Delhi, India: Springer Berlin Heidelberg, 2008, 49(47):512-519.
[4]  KOHAVI R,JOHN G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1): 273-324.
[5]  GUYON I, ELISSCCFF A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3(6):1157-1182.
[6]  SAEYS Y, INZA I, LARRANAGA P. A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23(19):2507-2517.
[7]  万中英,王明文,左家莉,等. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版),2016,51(5):87-93. WAN Zhongying, WANG Mingwen, ZUO Jiali, et al. Feature selection combined with the global and local information[J]. Journal of Shandong University(Natural Science), 2016, 51(5):87-93.
[8]  LI Yi, LONG P M. The relaxed online maximum margin algorithm[J]. Machine Learning, 2002, 46(1-3):361-387.
[9]  李霞,王连喜,蒋盛益.面向不平衡问题的集成特征选择[J].山东大学学报(工学版),2011,41(3):7-11. LI Xia, WANG Lianxi, JIANG Shengyi. Ensemble learning based feature selection for imbalanced problems[J]. Journal of Shandong University(Engineering Science), 2011, 41(3):7-11.
[10]  孙大为,张广艳,郑纬民.大数据流式计算:关键技术及系统实例[J].软件学报,2014,25(4):839-862. SUN Dawei, ZHANG Guangyan, ZHENG Weimin. Big data stream computing: technologies and instances[J]. Journal of Software, 2014, 25(4):839-862.
[11]  LANGFORD J, LI Lihong, ZHANG Tong. Sparse online learning via truncated gradient[J]. Journal of Machine Learning Research, 2009, 10(1):777-801.
[12]  GENTILE Claudio. A new approximate maximal margin classification algorithm[J]. Journal of Machine Learning Research, 2001, 2(2):213-242.
[13]  KIVINEN J, SMOLA A J, WILLIAMSON R C. Online learning with kernels[J]. IEEE Transactions on Signal Processing, 2004, 52(8):2165-2176.
[14]  DONOHO D L. Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4):1289-1306.
[15]  李志杰,李元香,王峰,等. 面向大数据分析的在线学习算法综述[J].计算机研究与发展,2015,52(8):1707-1721. LI Zhijie, LI Yuanxiang, WANG Feng, et al. Online learning algorithms for big data analytics: a survey[J]. Journal of Computer Research and Development, 2015, 52(8):1707-1721.
[16]  ROSENBLATT F. A probabilistic model for information storage and organization in the brain1[J]. Artificial Intelligence: Critical Concepts, 2000, 2(6):386-408.
[17]  CRAMMER Koby, DEKEL Ofer, KESHET Joseph, et al. Online passive-aggressive algorithms[J]. Journal of Machine Learning Research, 2006, 7(3):551-585.
[18]  DUCHI J, SINGER Y. Efficient online and batch learning using forward backward splitting[J].The Journal of Machine Learning Research, 2009, 10(8):2899-2934.
[19]  DREDZE Mark, CRAMMER Koby, PEREIRA Fernando. Confidence-weighted linear classification[J]. Journal of Machine Learning Research, 2012, 13(9):1891-1926.
[20]  XU Z, JIN R, YE J, et al. Non-monotonic feature selection[C] // International Conference on Machine Learning. Montreal, USA: ACM, 2009: 45-51.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133