全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于两阶段集成学习的分类器集成

Keywords: 机器学习,数据挖掘,文本处理,分类

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了学习集成函数,提高分类性能,提出了两阶段集成学习方法(two-phasesensemblelearning,简称为TPEL).结合垃圾邮件过滤一个2类文本分类问题,在4个公用数据集上对TPEL进行了一系列实验.实验结果表明,TPEL受集成的个体分类器个数的影响甚微;利用TPEL集成异构的多个分类器时效果显著;利用TPEL集成多个同构分类器时,绝大部分情况下取得了优于朴素贝叶斯等算法的结果,对稳定或不稳定学习器的集成效果都很好;TPEL的时间复杂度较低.

References

[1]  苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.SU Jin-shu,ZHANG Bo-feng,XU Xin.Advances in machine learning based text categorization[J].Journal of Software,2006,17(9):1848-1859.(in Chinese)
[2]  李文斌,刘椿年,陈嶷瑛.基于混合高斯模型的电子邮件多过滤器融合方法[J].电子学报,2006,34(2):247-251.LI Wen-bin,LI U Chun-nian,CHEN Yi-ying.Combining multiple email filters of naive bayes based on GMM[J].ActaElectronica Sinica,2006,34(2):247-251.(in Chinese)
[3]  刁力力,胡可云,陆玉昌,等.用Boosting方法组合增强Stumps进行文本分类[J].软件学报,2002,13(8):1361-1367.DIAO Li-li,HU Ke-yun,LU Yu-chang,et al.Improved stumps combined by boosting for text categorization[J].Journal ofSoftware,2002,13(8):1361-1367.(in Chinese)
[4]  鲁湛,丁晓青.基于分类器判决可靠度估计的最优线性集成方法[J].计算机学报,2002,25(8):890-895.LU Zhan,DI NG Xiao-qing.An optimal linear combination method by evaluating the reliability of individual classifiers[J].Chinese Journal of Computers,2002,25(8):890-895.(in Chinese)
[5]  李凯,黄厚宽.小规模数据集的神经网络集成算法研究[J].计算机研究与发展,2006,43(7):1161-1166.LI Kai,HUANG Hou-kuan.Study of a neural network ensemble algorithm for small data sets[J].Journal of ComputerResearch and Development,2006,43(7):1161-1166.(in Chinese)
[6]  周志华,陈世福.神经网络集成[J].计算机学报,2002,25(1):1-8.ZHOU Zhi-hua,CHEN Shi-fu.Neural network ensemble[J].Chinese Journal of Computers,2002,25(1):1-8.(inChinese)
[7]  唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502.TANG Wei,ZHOUZhi-hua.Bagging-based selective cluster ensemble[J].Journal of Software,2005,16(4):496-502.(inChinese)
[8]  BREIMAN L.Bagging predictors[J].Machine Learning,1996,24(2):123-140.
[9]  SCHAPIRE R E.The strength of weak learn ability[J].Machine Learning,1990,5:197-227.
[10]  The University of Waikato.Weka开发包[DB/OL].(1998-01-02)[2009-11-02].http:∥www.cs.waikato.ac.nz/ml/weka/
[11]  ANDROUTSOPOULOS I.PU1数据集[DB/OL].(2000-03-28)[2010-03-09].http:∥www.aueb.gr/users/ion/publications.html
[12]  SAKKIS G.Lingspam数据集[DB/OL].(2003-05-16)[2010-03-09].http:∥www.aueb.gr/users/ion/publications.html
[13]  姜远,周志华.基于词频分类器集成的文本分类方法[J].计算机研究与发展,2006,43(10):1681-1687.JIANG Yuan,ZHOU Zhi-hua.A text classification method based on term frequency classifier ensemble[J].Journal ofComputer Research and Development,2006,43(10):1681-1687.(in Chinese)
[14]  DIETTERICHTG.Machine learning research:four current directions[J].AI Magazine,1997,18(4):97-136.
[15]  ZHOU Zhi-hua,TANG Wei.Selective ensemble of decision trees[C]∥Lecture Notes in Artificial Intelligence.Berlin:Springer,2003,26391:476-483.
[16]  ZHONG N,MATSUNAGA T,LI U C N.A text mining agents based architecture for personal e-mail filtering andmanagement[C]∥Lecture Notes in Computer Science.Berlin:Springer,2002:329-336.
[17]  樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131.FAN Xing-hua,SUN Mao-song.A high performance two-class Chinese text categorization method[J].Chinese Journal ofComputers,2006,29(1):124-131.(in Chinese)
[18]  FREUND Y.Boosting a weak algorithm by majority[J].Information and Computation,1995,121(2):256-285.
[19]  OPITZ D,MACLI N R.Popular ensemble methods:an empirical study[J].Journal of Artificial Intelligence Research,1999,11:169-198.
[20]  Apache Software Foundation.SpamAssassin数据集[DS/OL].(2002-02-08)[2010-03-09].http:∥spamassassin.apache.org/publiccorpus/
[21]  HEWLETT-PACKARD L.Spambase数据集[DB/OL].(1998-06-10)[2010-03-09].http:∥www.ics.uci.edu/~mlearn/databases/spambase/
[22]  YANG Y,PEDERSEN J O.A comparative study on feature selection in text categorization[C]∥Proc of the 14thInternational Conference on Machine Learning.[S.l.]:Morgan Kaufmann,1997:412-420.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133