|
软件学报 2008
Co-Training by Committee: A Generalized Framework for Semi-Supervised Learning with Committees
Keywords: data mining,semi-supervised learning,co-training,classi cation,ensemble learning,decision tree,visual object recognition Abstract: Many data mining applications have a large amount of data but labeling data is often di cult, expensive, or time consuming, as it requires human experts for annotation.Semi-supervised learning addresses this problem by using unlabeled data together with labeled data to improve the performance. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by two or more redundantly su cient sets of features (views) and additionally these views are independent given the class. However, these assumptions are not satis ed in many real-world application domains. In this paper, a framework called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classi ers is used for semi-supervised learning that requires neither redundant and independent views nor di erent base learning algorithms. The framework is a general single-view semi-supervised learner that can be applied on any ensemble learner to build diverse committees. Experimental results of CoBC using Bagging, AdaBoost and the Random Subspace Method (RSM) as ensemble learners demonstrate that error diversity among classi ers leads to an e ective Co-Training style algorithm that maintains the diversity of the underlying ensemble.
|