%0 Journal Article %T A semi-structured document model for text mining
A Semi-Structured Document Model for Text Mining %A Yang Jianwu %A and Chen Xiaoou %A
杨建武 %A 陈晓鸥 %J 计算机科学技术学报 %D 2002 %I %X A semi-structured document has more structured information compared to an ordinary document, and the relation among semi-structured documents can be fully utilized. In order to take advantage of the structure and link information in a semi-structured document for better mining, a structured link vector model (SLVM) is presented in this paper, where a vector represents a document, and vectors' elements are determined by terms, document structure and neighboring documents. Text mining based on SLVM is described in the procedure of K-means for briefness and clarity: calculating document similarity and calculating cluster center. The clustering based on SLVM performs significantly better than that based on a conventional vector space model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86. %K semi-structured document %K XML %K text mining %K vector space model %K structured link vector model
HTML语言 %K XML语言 %K 半结构文件模型 %K 版本开采 %K 结构信息 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=F57FEF5FAEE544283F43708D560ABF1B&aid=1A6F1253A068FD19D0F6B2C8C057D508&yid=C3ACC247184A22C1&vid=BCA2697F357F2001&iid=94C357A881DFC066&sid=2B25C5E62F83A049&eid=2B25C5E62F83A049&journal_id=1000-9000&journal_name=计算机科学技术学报&referenced_num=7&reference_num=10