%0 Journal Article %T Active Learning Based Text Categorization
基于主动学习的文档分类 %A QIN Gang-Li HUANG Ke YANG Jia-Ben %A
覃刚力 %A 黄科 %A 杨家本 %J 计算机科学 %D 2003 %I %X In the field of text categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text categorization is the problem of categorization in high-dimension vector space, and more training samples will generally improve the accuracy of text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of text categorization in this paper,exploring the method of using unlabeled documents to improve the accuracy of text classifier. It is expected that such technology will improve text classifier's accuracy through adopting relatively large number of unlabelled documents samples. We brought forward an active learning based algorithm for text categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it's effective for the algorithm to promote text classifier's accuracy through adopting unlabelled document samples. %K Active learning %K Text categorization %K VSM %K Machine learning
机器学习 %K 主动学习 %K 文档分类算法 %K 特征提取 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=C87FA256B411E504&yid=D43C4A19B2EE3C0A&vid=340AC2BF8E7AB4FD&iid=F3090AE9B60B7ED1&sid=94E7F66E6C42FA23&eid=B6DA1AC076E37400&journal_id=1002-137X&journal_name=计算机科学&referenced_num=3&reference_num=13