%0 Journal Article
%T Using DragPushing to Refine Concept Index for Text Categorization
%A Xueqi Cheng
%A Songbo Tan
%A Lilian Tang
%A
Xueqi Cheng
%A Songbo Tan
%A and Lilian Tang
%J 计算机科学技术学报
%D 2006
%I
%X Concept index (CI) is a very fast and efficient feature extraction (FE) algorithm for text classification. The key approach in CI scheme is to express each document as a function of various concepts (centroids) present in the collection. However, the representative ability of centroids for categorizing corpus is often influenced by so-called model misfit caused by a number of factors in the FE process including feature selection to similarity measure. In order to address this issue, this work employs the "DragPushing" Strategy to refine the centroids that are used for concept index. We present an extensive experimental evaluation of refined concept index (RCI) on two English collections and one Chinese corpus using state-of-the-art Support Vector Machine (SVM) classifier. The results indicate that in each case, RCI-based SVM yields a much better performance than the normal CI-based SVM but lower computation cost during training and classification phases.
%K text classification
%K information retrieval
%K machine learning
文本分类
%K 信息检索
%K 知识系统
%K 特征提取
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=F57FEF5FAEE544283F43708D560ABF1B&aid=732FA635F5133DCB07D837B348CF34C5&yid=37904DC365DD7266&vid=659D3B06EBF534A7&iid=E158A972A605785F&sid=106103EB0EA31435&eid=6341CCF6B158C5F9&journal_id=1000-9000&journal_name=计算机科学技术学报&referenced_num=0&reference_num=20