全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于样本条件价值改进的Co-training算法

DOI: 10.3724/SP.J.1004.2013.01665, PP. 1665-1673

Keywords: 机器学习,半监督学习,Co-training,富信息样本,条件价值

Full-Text   Cite this paper   Add to My Lib

Abstract:

?Co-training是一种主流的半监督学习算法.该算法中两视图下的分类器通过迭代的方式,互为对方从无标记样本集中挑选新增样本,以更新对方训练集.Co-training以分类器的后验概率输出作为新增样本的挑选策略,该策略忽略了样本对于当前分类器的价值.针对该问题,本文提出一种改进的Co-training式算法—CVCOT(Conditionalvalue-basedco-training),即采用基于样本条件价值的挑选策略来优化Co-training.通过定义无标记样本的条件价值,各视图下的分类器以样本条件价值为依据来挑选新增样本,以此更新训练集.该策略既可保证新增样本的标记可靠性,又能优先将价值较高的富信息样本补充到训练集中,可以有效地优化分类器.在UCI数据集和网页分类应用上的实验结果表明:CVCOT具有较好的分类性能和学习效率.

References

[1]  Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. Wisconsin, MI: ACM, 1998. 92-100
[2]  Pierce D, Cardie C. Limitations of co-training for natural language learning from large datasets. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing. Pittsburgh, PA, 2001. 1-9
[3]  Li M, Li H, Zhou Z H. Semi-supervised document retrieval. Information Processing and Management, 2009, 45(3): 341-355
[4]  Mavroeidis D, Chaidos K, Pirillos S, Vazirgiannis M. Using tri-training and support vector machines for addressing the ECML-PKDD 2006 discovery challenge. In: Proceedings of the 2006 ECML-PKDD Discovery Challenge Workshop. Berlin, Germany, 2006. 39-47
[5]  Singh A, Nowak R D, Zhu X J. Unlabeled data: now it helps, now it doesn't. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2008. 1513-1520
[6]  Balcan M, Blum A, Yang K. Co-training and expansion: towards bridging theory and practice. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2005. 89-96
[7]  Du J, Ling C X, Zhou Z H. When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering, 2011, 23(5): 788-799
[8]  Zhou Z H, Li M. Semi-supervised learning by disagreement. Knowledge and Information Systems, 2010, 24(3): 415-439
[9]  Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541
[10]  Deng Cao, Guo Mao-Zu. ADE-Tri-training: tri-training with adaptive data editing. Chinese Journal of Computers, 2007, 30(8): 1213-1226 (邓超, 郭茂祖. 基于自适应数据剪辑策略的Tri-training算法. 计算机学报, 2007, 30(8): 1213-1226)
[11]  Chen Rong, Cao Yong-Feng, Sun Hong. Multi-class image classification with active learning and semi-supervised learning. Acta Automatica Sinica, 2011, 37(8): 954-962 (陈荣, 曹永锋, 孙洪. 基于主动学习和半监督学习的多类图像分类. 自动化学报, 2011, 37(8): 954-962)
[12]  Muslea I, Minton S, Knoblock C A. Active+Semi-supervised learning=Robust multi-view learning. In: Proceedings of the 19th International Conference on Machine Learning. Sydney, Australia: Morgan Kaufmann Publishers Inc, 2002. 435-442
[13]  Zhou Z H, Chen K J, Dai H B. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, 2006, 24(2): 219-244
[14]  Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 1995. 189-196
[15]  Asuncion A, Newman D J. UCI machine learning repository [Online], available: http://archive.ics.uci.edu/ml/datasets. html, January 10, 2010
[16]  Chapelle O, Sch?lkopf B, Zien A. Semi-Supervised Learning. Cambridge, MA: MIT Press, 2006
[17]  Zhu X J. Semi-supervised Learning Literature Survey, Computer Science Technical Report 1530. University of Wisconsin Madison, USA, 2008
[18]  Steedman M, Osborne M, Sarkar A, Clark S, Hwa R, Hockenmaier J, Ruhlen P, Baker S, Crim J. Bootstrapping statistical parsers from small datasets. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Budapest, Hungary: Association for Computational Linguistics Stroudsburg, 2003. 331-338
[19]  Li M, Zhou Z H. Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 2007, 37(6): 1088-1098
[20]  Settles B. Active Learning Literature Survey, Computer Science Technical Report 1648, University of Wisconsin-Madison, USA, 2009
[21]  Dasgupta S, Littman M L, McAllester D. PAC generalization bounds for co-training. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2001. 375-382
[22]  Wang W, Zhou Z H. A new analysis of co-training. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel, 2010. 1135-1142
[23]  Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th ACM International Conference on Information and Knowledge Management. McLean, VA: ACM, 2000. 86-93
[24]  Goldman S A, Zhou Y. Enhancing supervised learning with unlabeled data. In: Proceedings of the 17th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann Publishers Inc, 2000. 327-334
[25]  Li M, Zhou Z H. SETRED: self-training with editing. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Hanoi, Vietnam: Springer-Verlag, 2005. 611-621
[26]  Zhang M L, Zhou Z H. CoTrade: confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 2011, 41(6): 1612-1626
[27]  MaCallum A, Nigam K. Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann, 1998. 350-358
[28]  Muslea I, Minton S, Knoblock C A. Active learning with multiple views. Journal of Artificial Intelligence Research, 2006, 27(1): 203-233
[29]  Li M, Zhang H Y, Wu R X, Zhou Z H. Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 2012, 19(2): 201-230
[30]  Lewis D D, Gale W A. A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY: Springer-Verlag, 1994. 3-12

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133