OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

北京工业大学学报 2012

基于特征分布的半监督分类

Keywords: 半监督分类,特征分布,类相似性

Full-Text Cite this paper Add to My Lib

Abstract:

为了避免倾向于高频词的信息增益(informationgain,IG)方法忽略各类别间的相似性特点,提出了一种基于特征分布的选择方法对IG进行修正,使真正拥有高类别区分信息的特征项被保留.同时,对最大期望值(expectationmaximization,EM)算法的效率低下问题加以改进,将拥有较高后验类别概率的未标注文档逐步从未标注文档集转至已标注文档集,有效减少算法迭代次数.测试结果表明,基于特征分布的半监督学习方法在Reuter-21578和Epinion.com两个不同特点的数据集上都取得了较好的分类效果和性能.

References

[1]	徐燕,李锦涛,王斌,等.基于区分类别能力的高性能特征选择方法[J].软件学报,2008,19(1):82-89.XU Yan,LI Jin-tao,WANG Bin,et al.A category resolvepower-based feature selection method[J].Journal ofSoftware,2008,19(1):82-89.(in Chinese)
[2]	YANG Y,PEDERSEN J O.A comparative study onfeature selection in text categorization[C]∥Proceedings ofthe 14th International Conference on Machine Learning.Nashville:Morgan Kaufmann,1997:412-420.
[3]	杨玉珍,刘培玉,朱振方,等.应用特征项分布信息的信息增益改进方法研究[J].山东大学学报:理学版,2009,44(11):48-51.YANG Yu-zhen,LIU Pei-yu,ZHU Zhen-fang,et al.Research of an improved information gain method usingdistribution information of terms[J].Journal of ShandongUniversity:Natural Science,2009,44(11):48-51.(inChinese)
[4]	GUYON I,ELISSEEFF A.An introduction to variable andfeature selection[J].Journal of Machine LearningResearch,2003,3:1157-1182.
[5]	BACCIANELLA S,ESULI A,SEBASTIANI F.Multi-facetrating of product reviews[C]∥Proceedings of the31st European Conference on Information Retrieval.Toulouse:Springer,2009:461-472.
[6]	CHAPELLE O,SCHOLKOPF B,ZIEN A.Semi-supervised learning[M].Boston:MIT Press,2006:1-31.
[7]	苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.SU Jin-shu,ZHANG Bo-feng,XU Xin.Advance inmachine learning based text categorization[J].Journal ofSoftware,2006,17(9):1848-1859.(in Chinese)
[8]	HAN Eui-hong,KARYPIS G,KUMAR V.Textcategorization using weight adjusted K-nearest neighborclassification[C]∥Proceedings of the 5th Pacific-AsiaConference on Knowledge Discovery and Data Mining.Hong Kong:Springer,2001:53-65.
[9]	PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends in Information Retrieval,2008,2(1/2):1-135.
[10]	PANG Bo,LEE Lillian.See stars:exploiting classrelationships for sentiment categorization with respect torating scales[C]∥Proceedings of the 43rd Meeting ofthe Association for Computational Linguistics.AnnArbor:IEEE,2005:115-124.
[11]	NIGAM K,MCCALLUM A,THRUN S,et al.Textclassification from labeled and unlabeled documents usingEM[J].Machine Learning,2000,39(2/3):103-134.
[12]	MASSA P,AVESANI P.Trust metrics on controversialusers:balancing between tyranny of the majority and echochambers[J].International Journal on Semantic Weband Information Systems,2007,3(1):39-64.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133