OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

电子学报 2008

基于差异性评估对Co-training文本分类算法的改进

, PP. 138-143

唐焕玲,林正奎,鲁明羽

Keywords: 半监督文本分类,Co-training,特征视图,差异性评估,标注文本,未标注文本

Full-Text Cite this paper Add to My Lib

Abstract:

Co-training算法要求两个特征视图满足一致性和独立性假设,但是,许多实际应用中不存自然的划分且满足这种假设的两个视图,且直接评估两个视图的独立性有一定的难度.分析Co-training的理论假设,本文把寻找两个满足一致性和独立性特征视图的目标,转变成寻找两个既满足一定的正确性,又存在较大的差异性的两个基分类器的问题.首先利用特征评估函数建立多个特征视图,每个特征视图包含足够的信息训练生成一个基分类器,然后通过评估基分类器之间的差异性间接评估二者的独立性,选择两个满足一定的正确性和差异性比较大的基分类器协同训练.根据每个视图上采用的分类算法是否相同,提出了两种改进算法TV-SC和TV-DC.实验表明改进的TV-SC和TV-DC算法明显优于基于随机分割特征视图的Co-Rnd算法,而且TV-DC算法的分类效果要优于TV-SC算法.

References

[1]	Seeger M.Leaming with labeled and unlabeled data[R].University of Edinburgh,Edinburgh,UK 2001.
[2]	Nigam K,Ghani R.Analyzing the effectiveness and applicability of co-training[A].In Proceedings of ninth International Conference on Information and Knowledge Management[C].New York:ACM Press,2000.86-93.
[3]	Zhou Y,Goldman S.Democratic co-learning[A].In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence[C].Washington,DC:IEEE Computer Society Press,2004.594-602.
[4]	Zhou Z-H,Li M.Tri-training:exploiting tmlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
[5]	唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整[J].计算机研究与发展,2005,42(1):47-53.Tang Huanling,Sun Jiantao,Lu Yuchang.A weight adjustment technique with feature weight function named TEF-WA in text categorization[J].Journal of Computer Research and Development,2005,42 (1):47-53.(In Chinese)
[6]	Fabrizio Sebastiani.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34 (1):1-47.
[7]	Create M feature views V1,…,VM based TEF-WA; (see Equ.1～9);
[8]	Use f and Vt (L) to create classifiers h,,t=1,…,M;
[9]	Compute Mp(ht),Mp(hs )and DM(ht,hs),t,s=1,…,M; (See Equ.10-15);
[10]	Select two classifiers with certain accuracy and higher diversity according to Mp(ht),Mp(h,)and {DM(ht,hs)},let V1 and V2 be the associated subviews;
[11]	Blum A,Mitchell T.Combining labeled and unlabeled data with co-training[A].In Proceedings of the Workshop on Computational Learning Theory[C].New York:ACM Press,1998.92-100.
[12]	Balcan M-F,Blum A.A PAC-style model for learning from labeled and unlabeled data[A].In Proceedings of the 18th Annual Conference on Leammg Theory[C].Berlin Heidelberg:Springer-Verlag,2005.111-126.
[13]	Chapelle O,Sindhwani V,Keerthi S S.Optimization techniques for semi-supervised support vector machines[J].Journal of Machine Learning Research,2008,9:203-233.
[14]	Kuncheva L I,Whitaker C J.Measures of diversity in classifier ensembles[J].Machine Learning,2003,51(2):181-207.
[15]	Ruta D,Gabrys B.A theoretical analysis of the limits of majority voting in multiple classifier systems[J].Pattern Analysis & Applications,2002,5 (4):333-350.
[16]	Yang Y,Pedersen J P.A comparative study on feature selection in text categorization[A].In Proceedings of the Fourteenth International Conference on Machine Learning[C].San Francisco,USA:Morgan Kaufmann Publishers,1997.412-420.
[17]	Loop for r iterations 5.1) Create classifiers h1 and h2 using f and V1 (L),V2 (L) respectively; 5.2)For each class cj Do 5.2.1)Let b1 and b2 be unlabeled documents on which h1 and h2 make

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133