全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于半监督学习的数据流集成分类算法

, PP. 292-299

Keywords: 属性权值,概念漂移,集成分类器,同质性,K均值聚类,半监督学习,数据流分类

Full-Text   Cite this paper   Add to My Lib

Abstract:

已有的数据流分类算法多采用有监督学习,需要使用大量已标记数据训练分类器,而获取已标记数据的成本很高,算法缺乏实用性。针对此问题,文中提出基于半监督学习的集成分类算法SEClass,能利用少量已标记数据和大量未标记数据,训练和更新集成分类器,并使用多数投票方式对测试数据进行分类。实验结果表明,使用同样数量的已标记训练数据,SEClass算法与最新的有监督集成分类算法相比,其准确率平均高5。33%。且运算时间随属性维度和类标签数量的增加呈线性增长,能够适用于高维、高速数据流分类问题。

References

[1]  Han Jiawei,Kamber M.Data Mining: Concepts and Techniques.2nd Edition.Singapore,Singapore: Elsevier,2006
[2]  Wang Haixun,Fan Wei,Yu P S,et al.Mining Concept-Drifting Data Streams Using Ensemble Classifiers // Proc of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington,USA,2003: 226-235
[3]  Aggarwal C.Data Streams: Models and Algorithms.Berlin,Germany: Springer,2007
[4]  Gehrke J,Ganti V,Ramakrishnan R,et al.Boat-Optimistic Decision Tree Construction // Proc of the ACM SIGMOD International Conference on Management of Data.Philadelphia,USA,1999: 169-180
[5]  Domingos P,Hulten G.Mining High-Speed Data Streams // Proc of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,USA,2000: 71–80
[6]  Hulten G,Spencer L,Domingos P.Mining Time-Changing Data Streams // Proc of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Francisco,USA,2001: 97-106
[7]  Scholz M,Klinkenberg R.An Ensemble Classifier for Drifting Concepts // Proc of the 2nd International Workshop on Knowledge Discovery in Data Streams.Portugal,Porto,2005: 53-64
[8]  Aggarwal C C,Han J,Wang Jianyong,et al.A Framework for On-Demand Classification of Evolving Data Streams.IEEE Trans on Knowledge and Data Engineering,2006,18(5): 577-589
[9]  Masud M M,Gao Jing,Khan L,et al.A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data // Proc of the 8th International Conference on Data Mining.Pisa,Italy,2008: 929-934
[10]  Bifet A,Holmes G,Pfahringer B,et al.New Ensemble Methods for Evolving Data Streams // Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Ming.Paris,France,2009: 139-148
[11]  Tumer K,Ghosh J.Error Correlation and Error Reduction in Ensemble Classifiers.Connection Science,1996,18(3): 385-403
[12]  Chapelle O,Schoelkopf B,Zien A.Semi-Supervised Learning.Cambridge,USA: MIT Press,2006
[13]  Simon G J,Kumar V,Zhang Zhili.Semi-Supervised Approach to Rapid and Reliable Labeling of Large Data Sets // Proc of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Las Vegas,USA,2008: 641-649
[14]  Tsai C,Chiu C.Developing a Feature Weight Self-Adjustment Mechanism for a K-Means Clustering Algorithm.Computational Statistics and Data Analysis,2008,52(10): 4658-4672
[15]  Breiman L.Bagging Predictors.Machine Learning,1996,24(2): 123-140
[16]  Bifet A,Kirkby R,Holmes G,et al.MOA: Massive Online Analysis [EB/OL].[2011-05-05].http://sourceforge.net/projects /moa-datastream

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133