%0 Journal Article
%T A semi-structured document model for text mining
A Semi-Structured Document Model for Text Mining
%A Yang Jianwu
%A and Chen Xiaoou
%A
杨建武
%A 陈晓鸥
%J 计算机科学技术学报
%D 2002
%I
%X A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86.
%K semi-structured document
%K XML
%K text mining
%K vector space model
%K structured link vector model
HTML语言
%K XML语言
%K 半结构文件模型
%K 版本开采
%K 结构信息
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=F57FEF5FAEE544283F43708D560ABF1B&aid=1A6F1253A068FD19D0F6B2C8C057D508&yid=C3ACC247184A22C1&vid=BCA2697F357F2001&iid=94C357A881DFC066&sid=2B25C5E62F83A049&eid=2B25C5E62F83A049&journal_id=1000-9000&journal_name=计算机科学技术学报&referenced_num=7&reference_num=10