%0 Journal Article
%T Advances in Machine Learning Based Text Categorization
基于机器学习的文本分类技术研究进展
%A SU Jin-Shu
%A ZHANG Bo-Feng
%A XU Xin
%A
苏金树
%A 张博锋
%A 徐昕
%J 软件学报
%D 2006
%I
%X In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given.
%K automatic text categorization
%K machine learning
%K dimensionality reduction
%K kernel method
%K unlabeled data set
%K skewed data set
%K hierarchical categorization
%K large-scale text categorization
%K Web page categorization
自动文本分类
%K 机器学习
%K 降维
%K 核方法
%K 未标注集
%K 偏斜数据集
%K 分级分类
%K 大规模文本分类
%K Web页分类
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=7762FA2AE0BF34A5&yid=37904DC365DD7266&vid=BCA2697F357F2001&iid=9CF7A0430CBB2DFD&sid=47AB527E821B9862&eid=A477487C019ACAC8&journal_id=1000-9825&journal_name=软件学报&referenced_num=75&reference_num=94