%0 Journal Article %T Comparative study on text representation schemes in Chinese text classification
中文文本分类中的文本表示因素比较 %A ZHANG Ai-Hua %A JING Ji-Wu %A XIANG Ji %A
张爱华 %A 荆继武 %A 向继 %J 中国科学院研究生院学报 %D 2009 %I %X We investigated the representation methods for text classification, proposed the framework of analyzing Chinese text representation algorithms, analyzed the influence of text representation, and obtained the influence of variable text representation factors on classification effect. Using Chinese characters can directly obtain better effect than expected; there is little difference on classification effect among splitting articles with smaller or huger dictionary or even by complicated splitting algorithm; and classification with only 01 to represent whether a feature is presented in a text or not can lead to not bad effect. We also found it can greatly improve classification effect to use reasonable vector value such as suitable formalization algorithm. These conclusions have provided instructions to contifurther applications. %K Chinese text classification %K text presentation %K vectorization
中文文本分类 %K 文本表示 %K 向量化 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=B5EDD921F3D863E289B22F36E70174A7007B5F5E43D63598017D41BB67247657&cid=B47B31F6349F979B&jid=67CDFDECD959936E166E0F72DE972847&aid=CE0E584D206586AE2BB18A6769459900&yid=DE12191FBD62783C&vid=96C778EE049EE47D&iid=38B194292C032A66&sid=8ACD9060100C26F1&eid=8CCD0401CC9AE432&journal_id=1002-1175&journal_name=中国科学院研究生院学报&referenced_num=0&reference_num=17