全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

突发事件Web新闻多层次自动分类方法

Keywords: 文本分类,分类器,特征抽取,多层次体系,突发事件

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了对突发事件Web新闻进行更精确的分类,研究了突发事件Web新闻的多层次自动分类方法.该方法初步分析了突发事件Web新闻的分类,给出3层分类器的构造方法,即第1级和第2级通过规则定制来完成,第3级通过统计学习训练并实现,并研究了HTML文本向量空间模型及特征项的抽取方法.将该自动分类方法在甲型H1N1、法国空难以及汶川大地震等突发事件的Web新闻中进行了训练和测试.实验结果表明,所提方法的分类效果优于改进前的方法.

References

[1]  SEBASTIANI F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47.
[2]  SHEN Dou,YANG Qiang,CHEN Zheng.Noise reduction through summarization for Web-page classification[J].InformationProcessing and Management,2007,43(6):1735-1747.
[3]  周炎涛,唐剑波,吴正国.基于向量空间模型的多主题Web文本分类方法[J].计算机应用研究,2008(1):142-144.ZHOU Yan-tao,TANG Jian-bo,WU Zheng-guo.Method of multi-topic Web text classification based on VSM[J].ApplicationResearch of Computers,2008(1):142-144.(in Chinese)
[4]  蒲筱哥.Web自动文本分类技术研究综述[J].情报学报,2009(4):233-241.PU Xiao-ge.A literature review on Web automated text categorization technology[J].Journal of the China Society forScientific and Technical Information,2009(4):233-241.(in Chinese)
[5]  SUN A,LIM E P.Hierarchical text classification and evaluation[C]∥Proceedings of the 2001 IEEE InternationalConference on Data Mining.San Jose,California:[s.n.],2001:521-528.
[6]  李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004(4):9-14.LI Yu-qin,SUN Li-hua.Rule-based automatic category application on text category[J].Journal of Chinese InformationProcessing,2004(4):9-14.(in Chinese)
[7]  曾致远,张莉.基于向量空间模型的网页文本表示改进算法[J].计算机工程,2006(2):134-139.ZENG Zhi-yuan,ZHANG Li.Improved algorithm of Web document representation based on vector space model[J].Computer Engineering,2006(2):134-139.(in Chinese)
[8]  PENG X G,MING Z,WANG H T.Text learning and hierarchical feature selection in Web page classification[C]∥Proceedings of the 4th International Conference on Advanced Data Mining and Applications.Chengdu:Lecture Notes inArtificial Intelligence,2008:452-459.
[9]  王维娜,康耀红,伍小芹.文本分类中特征选择方法研究[J].信息技术,2008(12):29-31.WANG Wei-na,KANG Yao-hong,WU Xiao-qin.Study on feature selection in text categorization[J].InformationTechnology,2008(12):29-31.(in Chinese)
[10]  ZHOU Shui-geng,GUAN Ji-hong,HE Yan-xiang.Hierarchical classification of Chinese documents based on N-grams[J].Wuhan University Journal of Natural Sciences,2001,6(2):643-652.
[11]  YOON Yongwook,LEE Changki,LEE Gary Geunbae.An effective procedure for constructing a hierarchical text classificationsystem[J].Journal of the American Society for Information Science and Technology,2006,57(3):431-442.
[12]  MLADENIC D,GROBELNIK M,MARKO G.Feature selection on hierarchy of Web documents[J].Decision SupportSystems,2003,35(1):45-87.
[13]  LAM W,HAN Y.Automatic textual document categorization based on generalized instance sets and a metamodel[C]∥IEEETransactions on Pattern Analysis and Machine Intelligence.Los Alamitos:IEEE Computer Society,2003,25(5):628-633.
[14]  MCCALLUM A,NIGAM K.A comparison of event models for naive Bayes text classification[C]∥AAAI-98 Workshop onLearning for Text Categorization.California:AAAI Press,1998:41-48.
[15]  刘少辉,董明楷,张海俊,等.一种基于向量空间模型的多层次文本分类方法[J].中文信息学报,2002(3):8-14.LIU Shao-hui,DONG Ming-kai,ZHANG Hai-jun,et al.An approach of multi-hierarchy text classification based on vectorspace model[J].Journal of Chinese Information Processing,2002(3):8-14.(in Chinese)
[16]  肖雪,何中市.基于向量空间模型的中文文本层次分类方法研究[J].计算机应用,2006(5):1125-1126,1133.XIAO Xue,HE Zhong-shi.Hierarchical categorization methods of Chinese text based on vector space model[J].Journal ofComputer Applications,2006(5):1125-1126,1133.(in Chinese)
[17]  MARKOV A,LAST M,KANDEL A.The hybrid representation model for Web document classification[J].InternationalJournal of Intelligent Systems,2008,23(6):654-679.
[18]  汪永清.中华人民共和国突发事件应对法解读[M].北京:中国法制出版社,2007:6-40.
[19]  SALTON G,LESK M E.Computer evaluation of indexing and text processing[J].Journal of the Association for ComputingMachinery,1968,15(1):8-36.
[20]  MAO W,CHU W W.The phrase-based vector space model for automatic retrieval of free-text medical documents[J].Data&Knowledge Engineering,2007,61(1):76-92.
[21]  ICTCLAS.ICTCLAS数据集[DB/OL].[2009-04-23].http:∥ictclas.org/.
[22]  蒲强,李鑫,刘启和,等.一种Web主题文本通用提取方法[J].计算机应用,2007(6):1394-1396.PU Qiang,LI Xin,LIU Qi-he,et al.Study on general extracting method of Web topic text[J].Journal of ComputerApplications,2007(6):1394-1396.(in Chinese)

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133