全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2015 

Hadoop框架下的多标签传播算法
A Label Propagation Algorithm for Multi??Label Classification Using Hadoop Technology

DOI: 10.7652/xjtuxb201505021

Keywords: Hadoop,多标签分类,标签传播算法
Hadoop
,multi??label classification,label propagation algorithm

Full-Text   Cite this paper   Add to My Lib

Abstract:

标签传播算法的主要思想是利用已标注数据的标签信息预测未标注数据的标签信息。然而,传统传播算法没有区别对待未标注数据与已标注数据相互之间的转移信息,导致算法的收敛速度较慢,影响了算法的性能。针对传统算法的不足,提出了差异权重标签传播算法,算法按标注信息的重要性赋予不同的权重。在解决了大规模特征矩阵相乘问题之后,将提出的差异权重标签传播算法应用到Hadoop框架下,采用分布式计算,实现了能够处理大规模数据的多标签分类算法(HSML),并将提出的HSML算法与现有主流多标签分类算法进行了性能比较。实验结果表明,HSML算法在多标签分类的各项性能评测指标和执行速度上都是有效的。
A method of label propagation using Hadoop technology, named HSML, is proposed, to cope with the challenge of exponential??sized output space learning from multi??label data. Label propagation algorithms are graph??based semi??supervised learning methods, and use the label information of labeled data to predict the label information of unlabeled data. Traditional label propagation algorithms do not consider the posterior probability and distinguish information between labeled data and unlabeled data during the label propagation process, hence, the performance of traditional label propagation algorithms is affected. Therefore, a label propagation algorithm with different weights is proposed. After the multiplication problem of large??scale feature matrices is solved, the proposed algorithm is applied to the framework of Hadoop to deal with the problem of multi??label classification learning from big data. Experimental results and comparisons with some well??established multi??label learning algorithms, show that the performance of HSML is superior, and that the bigger test set is the faster HSML runs

References

[1]  [2]XU Miao, LI Yufeng, ZHOU Zhihua. Multi??label learning with pro loss [C]∥Proceedings of the 27th AAAI Conference on Artificial Intelligence. Palo Alto, California, USA: AAAI, 2013: 998??1004.
[2]  [11]Welcome to Apache [EB/OL]. [2013??10??14]. http:∥hadoop.apache.org.
[3]  [1]ZHANG Minling, ZHOU Zhihua. A review on multi??label learning algorithms [J]. IEEE Transactions on Knowledge & Data Engineering, 2014, 26(8): 1??59.
[4]  [3]SUN Y Y, ZHANG Y, ZHOU Z H. Multi??label learning with weak label [C]∥24th AAAI Conference on Artificial Intelligence. Palo Alto, California, USA: AAAI, 2010:593??598.
[5]  [4]孔祥南, 黎铭, 姜远, 等. 一种针对弱标记的直推式多标记分类方法 [J]. 计算机研究与发展, 2010, 47(8): 1392??1399.
[6]  [5]BOUTELL M R, LUO J, SHEN X, et al. Learning multi??label scene classification [J]. Pattern Recognition, 2004, 37(9): 1757??1771.
[7]  [7]ZHANG Minling, ZHOU Zhihua. ML??kNN: a lazy learning approach to multi??label learning [J]. Pattern Recognition, 2007, 40(7): 2038??2048.
[8]  [9]ELISSEEFF A, WESTON J. A kernel method for multi??labelled classification [C]∥Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT, 2002: 681??687.
[9]  [10]ZHU X J, GHAHRAMANI Z. Learning from labeled and unlabeled data with label propagation, CMU??CALD??02??107 [R]. Pittsburghers, USA: Carnegie Mellon University, 2002.
[10]  [12]Hadoop 集群安装 [EB/OL]. [2013??12??20]. http:∥blog.csdn.net/liou825/article/details/9320745.
[11]  [13]数据堂 [EB/OL]. [2014??04??01]. http:∥www.datatang.com/data/list.
[12]  [14]张敏灵个人主页 [EB/OL]. [2014??04??01].
[13]  KONG Xiangnan, LI Ming, JIANG Yuan, et al. A transductive multi??label classification method for weaklabeling [J]. Journal of Computer Research and Development, 2010, 47(8): 1392??1399.
[14]  [6]TSOUMAKAS G, VLAHAVAS I. Random k??labelsets: an ensemble method for multilabel classification [C]∥18th European Conference on Machine Learning. Berlin, Germany: Springer, 2007: 406??417.
[15]  [8]ZHANG Minling, ZHOU Zhihua. Multilabel neural networks with applications to functional genomics and text categorization [J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1338??1351

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133